R-bloggers

Repost: Automatically compile Quarto reports when new data lands

Stephen Turner — Wed, 15 Jul 2026 10:15:43 +0000

[This article was first published on Getting Genetics Done, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Reposted from the original at https://blog.stephenturner.us/p/turn-new-data-into-quarto-reports-automatically

The {watcher} R package monitors your filesystem and run arbitrary code when files change. You can use this to automate things like creating parameterized Quarto reports.

—

The watcher R package (watcher.r-lib.org) monitors your filesystem for changes, and can run code automatically when data is created or updated.

A helpful use case for this is to monitor a folder for changes, then render a Quarto report for whatever new data arrived.

Simple example here starting with an empty data directory and a Quarto template.

$ tree
.
├── data
└── report.qmd

This is a parameterized Quarto template that uses Typst to compile a simple PDF report showing a summary() of a CSV you read in.

---
title: "Automatically compiled report"
author: "Stephen Turner"
subtitle: "File: `r basename(params$csv_path)`"
date: today
format: typst
params:
    csv_path: NA
---

Compiled `r format(Sys.time(), "%Y-%m-%d %H:%M:%S %Z")` from `r params$csv_path`.

```{r}

```

```{r}
df <- read.csv(params$csv_path)
summary(df)
```

Now let’s set up the watcher. The watcher monitors a directory for new files, then runs quarto_render passing in the new file as a parameter.1

library(watcher)

render <- function(paths) {
  message(format(Sys.time()), ": ", length(paths), " file(s) changed")
  message(paths)
  quarto::quarto_render(
    "report.qmd",
    output_file = basename(paths),
    execute_params = list(csv_path = paths),
    quiet = TRUE
  )
}

w <- watcher(path = "data", callback = render, latency = 1)
w$start()

Now whenever new files land in data/ the watcher will automatically render the parameterized Quarto document, which just prints a summary of the data. The w$start() doesn’t tie up my R console. It’s running in the background.

Now, when I create new CSV files in the data directory, the watcher finds these and renders the reports.

> iris |> write.csv("data/iris.csv")
2026-07-15 05:45:28: 1 file(s) changed
/Users/sdt5z/Downloads/watcher-test/data/iris.csv

> penguins |> write.csv("data/penguins.csv")
2026-07-15 05:45:33: 1 file(s) changed
/Users/sdt5z/Downloads/watcher-test/data/penguins.csv

With this you could start the watcher in a background R process (e.g., running under tmux or something), monitoring a shared drive or some cloud location. Whenever new data arrives, a report gets compiled without you having to do anything.

Eruption: announcing new R package VolcanoPlotR

Stephen Royle — Wed, 15 Jul 2026 09:01:03 +0000

[This article was first published on Rstats – quantixed, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This is a short post to announce the release of an R package, VolcanoPlotR.

Background

Using proteomics, we often want to compare two experimental groups. A popular way to visualise this comparison this is via a volcano plot, where the enrichment of proteins in one condition is towards the right and their de-enrichment (or their enrichment in the other group) is towards the left. The y-axis denotes the p-value of the comparison. Years ago, I wrote an IGOR package called VolcanoPlot, that we have used in our work. For various reasons (see below), I have now ported the package to R.

Features

The current version takes an output (or multiple outputs) from MaxQuant, processes the data and can generate:

Volcano Plots
Mean vs mean plots
PCA
Text outputs of the analysis

It uses {ggplot2} for graphics and therefore they can be combined easily with other plots.

Some examples

There is full documentation available here. But in brief:

The package can be installed in the usual way.

# install.packages("pak")
pak::pak("quantixed/VolcanoPlotR")

There is an example file in the package that can be used for testing.

library(VolcanoPlotR)
# get the path to the proteinGroups.txt file included in the package
filepath <- system.file("extdata", "proteinGroups.txt", package = "VolcanoPlotR")
# get the filename fromt the path
filename <- basename(filepath)
# get the directory name
filedir <- dirname(filepath)
# run the automated procedure we will also tell it which groups to compare so
# we don't have to select them interactively
workflow_maxquant(file = filename, datadir = filedir,
                  group1 = "WT", group2 = "Control")

Which gives the following volcano plot:

This plot can be styled in several ways by the package, but also restyled using ggplot.

Rather than this automated workflow, the data can be loaded in, processed and used in other functions:

df <- load_maxquant(file = filename, datadir = filedir)
df <- process_maxquant(df, group1 = "WT", group2 = "Control")
# same output as the automated workflow
volcano_plot_maxquant(df)
# mean-mean plot with no labelling
mean_plot_maxquant(df)
# PCA plots to show the structure of the data
pca_plot_maxquant(df)
pca_plot_maxquant(df, by_protein = TRUE)

Why write an R package for volcano plots?

The best tool for analysing MaxQuant data is the Windows program Perseus. As wonderful as Perseus is, its graphics are terrible. This prompted me to write VolcanoPlot for IGOR years ago. The other motivation was that I wanted to make sure the analysis was correct (I felt back then that Perseus was too “black box” but I no longer hold that view). R is the main language we use in the lab these days and because IGOR is closed source and is no longer developed for mac, I am ported VolcanoPlot to R.

I wrote a while back about how I could recreate the look of VolcanoPlot in R, and the logical next step was to make a package, so that we no longer relied on VolcanoPlot. Potentially VolcanoPlotR could be expanded into a full-blown Perseus substitute, especially since PerseusR has been abandoned.

It’s true that there are several R packages out there, so why write anothere. There’s {EnhancedVolcano} and proteomics data analysis available in Bioconductor, as well as Joachim Goedhart’s shiny app. None of these suit our needs, for example VolcanoPlotR will combine multiple MaxQuant datasets, and I was also keen to keep the look of our graphics the same for consistency between papers.

—

The post title comes from “Eruption” by Van Halen from the Van Halen album. It’s an Eddie Van Halen instrumental which has broken the spirit of many a young guitar-slinger who’s tried to play it.

To leave a comment for the author, please follow the link and comment on their blog: Rstats – quantixed.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue reading: Eruption: announcing new R package VolcanoPlotR

From Peer Review to Mentorship: My rOpenSci Story

rOpenSci — Tue, 14 Jul 2026 00:00:00 +0000

[This article was first published on rOpenSci - open tools for open science, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Getting Involved with rOpenSci

I first came to rOpenSci in 2022, though at the time I barely knew what it was. I was getting a statistical package of mine ready to submit to the Journal of Statistical Software, and that is how I was pointed toward rOpenSci review: the journal directs authors to rOpenSci’s statistical software standards, so going through the review looked like a convenient step along the way. At the time, my focus was on polishing the software for the journal submission, not on rOpenSci itself.

What I did not expect was how much the process itself would change my perspective. Coming from an academic background, the open review on GitHub felt very different from the closed, anonymous process I was used to. It was rigorous without being adversarial. An editor and two reviewers carefully examined the package, and their feedback was constructive and grounded in rOpenSci’s well-defined standards and guidelines. The review improved the package, and it also gave me a greater appreciation for the collaborative approach behind open-source software review.

I was later invited to review a package myself. The experience gave me a different perspective on the review process. As a reviewer, I saw that the goal was not simply to determine whether a package met some standards or merits, but to help authors improve their software through constructive feedback. When applications opened for the Champions Program, mentoring felt like a natural next step. Having experienced rOpenSci as both a software author and a reviewer, it seemed like a meaningful way to contribute to the community.

Mentoring in the Champions Program

I was matched with Sunny Tseng as her mentor. Over the program, Sunny built bbsTaiwan, an R package that makes Taiwan’s Breeding Bird Survey data much easier to access and analyze. It was a real package solving a real problem for people who study Taiwan’s birds, which made it a pleasure to work on together.

Mostly, what I gave was time and attention. We worked through package scope, unit testing, version control, and the other practical aspects of building an R package. Many of our conversations were not about solving a particular technical problem, but about discussing trade-offs, identifying useful resources, and thinking through the next steps. Those conversations ended up being one of my favorite parts of the program.

What I valued most, though, was seeing how Sunny’s work was used after the project. It is easy to think of a package as code made available for others to use, but in this case it became a tool that supported people working with the same data. While visiting Taiwan, she also ran a session introducing it to members of that community. This reinforced my view that open-source software is not just code shared in public, but a way of bringing people together around shared work.

Looking Back

More than anything, my time with rOpenSci has left me with an appreciation for how thoughtfully it is run. Its initiatives, from peer review to mentorship, are organized with real care, and they are built to do more than improve software. They are designed to connect people, bring contributors together, and keep community at the center of open science.

Fifteen years in, what strikes me about rOpenSci is that it has always been more about people than about packages. What began as a convenient step on the way to a journal turned into one of the more rewarding parts of my work, and I have found value in each perspective I have seen as an author, a reviewer, and a mentor.

To leave a comment for the author, please follow the link and comment on their blog: rOpenSci - open tools for open science.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue reading: From Peer Review to Mentorship: My rOpenSci Story

My last R posts: How conformalization helps weak models, fast conformal prediction with jackknife+ (and no refitting), and sklearn in R

T. Moudiki — Mon, 13 Jul 2026 00:00:00 +0000

[This article was first published on T. Moudiki's Webpage - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This post is mainly (but not only) a test, because I had a broken xml feed for my R posts, and I wanted to see if it was fixed.

It’s about my last R posts from june and july, which are:

Using scikit-learn models in R easily with the tisthemachinelearner R package
How conformalization helps weak models
Fast Conformal Prediction for Some Machine Learning Models (jackknife+ and no refitting)

Using scikit-learn models in R easily with the `tisthemachinelearner` R package

This post is about the tisthemachinelearner R package, that allows to use scikit-learn models in R. It is a wrapper around the tisthemachinelearner Python package. Prediction intervals can be computed using either split conformal prediction, surrogate methods or the bootstrap.

Read: https://thierrymoudiki.github.io/blog/2026/06/21/r/tisthemllearner

How conformalization helps weak models

In this post, we compare split conformal prediction across several predictive models, using R package mlS3.

Read: https://thierrymoudiki.github.io/blog/2026/06/07/r/conformalization-helps-weak-models

Fast Conformal Prediction for Some Machine Learning Models (jackknife+ and no refitting)

It’s surprisingly fast to obtain conformal jackknife+ prediction intervals for Machine Learning models of the form $\hat{y} = Sy$ (including Ordinary Least Squares, Ridge Regression, Random Vector Functional Link Networks, Kernel Ridge Regression, smoothing splines, and local polynomial regression). No refitting involved, just Linear Algebra. Read https://www.researchgate.net/publication/408161842_Fast_Conformal_Prediction_for_Some_Machine_Learning_Models_via_Closed-Form_Jackknife.

To leave a comment for the author, please follow the link and comment on their blog: T. Moudiki's Webpage - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue reading: My last R posts: How conformalization helps weak models, fast conformal prediction with jackknife+ (and no refitting), and sklearn in R

Tabler Server A minimal framework to create web dashboards in R

https://pacha.dev/blog — Sun, 12 Jul 2026 23:00:00 +0000

[This article was first published on https://pacha.dev/blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

All of this is a very early work.

I have been using Shiny Server for around 8 years and R for almost 11 years.

At some point I created the tabler R package, whose current CRAN version is a layer between Shiny and Tabler to offer a refreshed look for dashboards.

The experimental tabler version on GitHub completely drops Shiny usage on its codebase and it is also compatible with widgets created for Shiny such as d3po.

Besides local apps, I have been working on Tabler Server to manage tabler apps on a Linux machine.

Nothing of this is particularly novel. Most of this was done standing on the shoulders of giants to combine R and Linux tools efficiently.

Here is a simple example:

pak::pkg_install("pachadotdev/tabler")

library(tabler)

ui <- page(
  title = "Example 1",
  layout = "boxed",
  navbar = list(
    top = topbar(title = "Example 1")
  ),
  body = body(
    h2("Example 1"),
    p("Type your name and it is echoed back below."),
    textInput("name", "Your name", value = "world"),
    textOutput("greeting")
  )
)

server <- function(input, output, session) {
  output$greeting <- renderText({
    paste0("Hello, ", input$name, "!")
  })

  syncUrl(session) # syncUrl(session, exclude = c("exclude", "this"))
}

tablerApp(ui, server)

A highlight for this project is that it that it simplifies URLs. Moving the sliders, dropdowns or text inputs updates the URL and changing the URL also modified the state app. For example, http://127.0.0.1:3000/?name=John can be changed to http://127.0.0.1:3000/?name=George, and then the text entry can be changed to “Paul” to print “Hello, Paul!” to then set it to “Ringo” or another value. Unlike Shiny, the URL does not use quotes and allows for cleaner inputs like ?year=2000&country=gbr&lang=en instead of ?year=2000&country=%22gbr%22&lang=%22en%22.

Both tabler and tabler server are released under the Apache License 2.0, a permissive license for commercial and non-commercial projects.

I hope you like this. Please feel free to test tabler, open issues or contribute to the codebase. If you find this useful, please consider donating on Buy Me A Coffee.

To leave a comment for the author, please follow the link and comment on their blog: https://pacha.dev/blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue reading: Tabler Server A minimal framework to create web dashboards in R

Testing racial predictions with BISG

Jerry Tuttle — Sat, 11 Jul 2026 20:34:16 +0000

[This article was first published on Online College Math Teacher, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Regulators expect insurance companies, banks, and others not to discriminate by race, but businesses are not allowed to collect the race of their customers. One solution for testing discrimination in the aggregate is to use U.S. Census data to estimate the probable race of each customer based on the customer’s name and geographic location. (The Census race data is self identified and excludes a multi racial category; I will leave it to the sociologists to discuss issues with these things and whether Hispanic is a race.)

The common algorithm for estimating race from names and geography is BISG (Bayesian Improved Surname Geocoding). As an example, suppose there is a customer Mary Johnson who lives in Essex County, NJ. If we want to predict her race only from her surname, we use nationwide Census probabilities such as P(Black ∣ Johnson) = 0.3441, P(White ∣ Johnson) = 0.5438, etc.

To improve the prediction, we add her county, using geographic weights such as the relative density of each race in Essex compared with nationwide. The resulting probability P(Black | Johnson & Essex) = (Black national prob × Black geographic weight) / ∑_race (race national prob × race geographic weight) = 0.7393.

Finally, we add her first name. Under a Naive Bayes assumption that surname, location, and first name are conditionally independent given race, Bayes’ rule allows us to multiply the components: P(race ∣ surname & geo & first ) ≈ P(race) x P(surname ∣ race) x P(geo ∣ race) x P(first ∣ race). The probability P(Black ∣ Johnson & Mary & Essex) becomes 0.7158.

A summary of these probabilities is:

Model Type	White	Black	Hispanic	Asian	Other
Surname Only	.5438	.3441	.0272	.0074	.0774
Surname+Geo	.1766	.7393	.0244	.0046	.0551
Surname+First+Geo	.2481	.7158	.0042	.0013	.0306

These calculations can be done with the R package wru (Who are you?). You will need a Census API key to download the Census data, available from https://api.census.gov/data/key_ssignup.html .

I was interested in measuring the accuracy of the BISG predictions. It is not easy to find real (not simulated) data linking name and race. However, several states make their voter registration records public. I requested and downloaded Florida voting registration data from Harvard Dataverse ( https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/UBIG3F ). I limited my study to a single Florida county, Lee, which had over 500,000 registered voters.

The Florida voting data required considerable data cleaning. Among the amusing data issues is that some Lee County voters registered with a specialty post office box zip code, or a zip code with a typo, or an out-of-state zip code. The first two were mapped to county level, and the third category was dropped.

There were 545,187 registered Lee County voters, distributed by race as follows:

Asian	Black	Hispanic	Other	White	NA
6912	32407	66677	9435	422333	7423

I performed the same BISG algorithm as above, against the Florida voter registration data for Lee County. The overall accuracy is 88%, but this is misleading because the Florida data is highly imbalanced toward the White race. A better accuracy measure is one that is calculated separately for each race, and I chose Balanced Accuracy = (Sensitivity + Specificity)/2. Sensitivity measures how well the algorithm finds people of that specific race, and specificity measures how well the algorithm avoids misclassifying people of other races into that specific race.

My Balanced Accuracy results of comparing BISG race predictions versus Florida self-identified race data is as follows:

White	Black	Hispanic	Asian	Other
.8226	.7179	.9007	.7251	.5041

I think these results are pretty good for Hispanic and White, not as good for Black and Asian, and of course pretty poor for Other. Other includes Native American, Native Hawaiian, and others, and are a small overall percentage of the county population.

The citation for the Florida voter data is: Sood, Gaurav, 2017, “Florida Voter Registration Data (2017 and 2022)”, https://doi.org/10.7910/DVN/UBIG3F, Harvard Dataverse, V2.

Here is my R code to predict Mary Johnson.

library(wru)
library(dplyr)

# 1. Create a 1-row dataset for Mary Johnson in Essex County, NJ (FIPS "013")
test_voter <- data.frame(
  surname = "JOHNSON",
  first = "MARY",
  state = "NJ",
  county = "013", # Essex County FIPS
  stringsAsFactors = FALSE
)

# 2. Grab the live Census geographic data for New Jersey
nj_census <- get_census_data(
  key = "xxxx",
  states = "NJ",
  age = FALSE,
  sex = FALSE
)

# 3. Generate the 3 prediction variations
pred_baseline  <- predict_race(voter.file = test_voter, surname.only = TRUE)
pred_geo       <- predict_race(voter.file = test_voter, census.data = nj_census, census.geo = "county")
pred_first_geo <- predict_race(voter.file = test_voter, census.data = nj_census, census.geo = "county", names.to.use = "surname, first")

# 4. Format and clean output to perfectly fit the RStudio layout
combined_predictions <- bind_rows(
  pred_baseline  %>% mutate(model_type = "Surname Only"),
  pred_geo       %>% mutate(model_type = "Surname+Geo"),
  pred_first_geo %>% mutate(model_type = "Surname+First+Geo")
) %>%
  rename(
    White = pred.whi,
    Black = pred.bla,
    Hispan = pred.his,
    Asian = pred.asi,
    Other = pred.oth
  ) %>%
  mutate(across(c(White, Black, Hispan, Asian, Other), ~ round(., 4))) %>%
  select(model_type, surname, first, White, Black, Hispan, Asian, Other)

# 5. Display the output summary
print(combined_predictions, row.names = FALSE)
# End of Mary Johnson prediction
##################################################################################

Here is my R code to measure the accuracy of the BISG predictions against Lee County voters. A considerable portion of the code is for data cleaning of the Florida voter registration data.

library(data.table)
library(dplyr)
library(stringr)
library(caret)
library(wru)

voter_file_path <- "xxxx/LEE_20220621.txt"
Sys.setenv(CENSUS_API_KEY = "xxxx")

raw_lines <- fread(
  file = voter_file_path,
  sep = NULL, # Read the row as a single string to lock layout shifting
  header = FALSE,
  col.names = "raw_row"
)

wru_ready_data <- raw_lines %>%
  mutate(
    # Extract structural components securely out of text blocks
    surname = str_match(raw_row, "\\d{9}\\s+(\\w+)")[,2],
    first = str_match(raw_row, "\\d{9}\\s+\\w+\\s+(\\w+)")[,2],
    zipcode = str_extract(raw_row, "\\b3\\d{4}\\b"),
    raw_race = str_match(raw_row, "\\b([MF])\\s+(\\d)\\b")[,3]
  ) %>%
  # Filter out empty records or text extraction errors
  filter(!is.na(surname) & !is.na(raw_race)) %>%
  # Map Florida's state administrative codes to wru categories
  mutate(
    true_race = case_when(
      raw_race == "5" ~ "white",
      raw_race == "3" ~ "black",
      raw_race == "4" ~ "hispanic",
      raw_race == "2" ~ "asian",
      raw_race %in% c("1", "6", "7") ~ "other",
      TRUE ~ NA_character_ # Automatically standardizes missing responses to NA
    )
  )

# Clean data of invalid zip codes and reroute PO Boxes to residential ZCTAs
wru_ready_data <- wru_ready_data %>%
  # Exclude out-of-state ZIP codes (Georgia 30xxx/31xxx, Alabama 35xxx/36xxx)
  filter(!substr(zipcode, 1, 2) %in% c("30", "31", "35", "36")) %>%
  # Map missing/typos and PO boxes directly to matching valid residential ZCTAs
  mutate(
    zipcode = case_when(
      # Defunct / Typos / Missing data map to Central Lee County Baseline
      zipcode %in% c("fl-NA", "33929", "33932", "33945", "33970") ~ "33901",
     
      # PO Box explicit routing to residential census counterparts
      zipcode == "33902" ~ "33901",  # Fort Myers PO Box -> Fort Myers Residential
      zipcode == "33906" ~ "33907",  # Fort Myers PO Box -> South Fort Myers Residential
      zipcode == "33910" ~ "33904",  # Cape Coral PO Box -> Cape Coral Residential
      zipcode == "33915" ~ "33919",  # Fort Myers PO Box -> Cypress Lake Residential
      zipcode == "33918" ~ "33903",  # N. Fort Myers PO Box -> N. Fort Myers Residential
      zipcode == "33994" ~ "33928",  # Bonita Springs PO Box -> Estero/Bonita Residential
      zipcode == "34106" ~ "34102",  # Naples PO Box -> Naples Residential
      zipcode == "34133" ~ "34135",  # Bonita Springs PO Box -> Bonita Residential
      zipcode == "34136" ~ "34135",  # Bonita Springs PO Box -> Bonita Residential
     
      TRUE ~ zipcode # Keep all other valid residential ZCTAs as they are
    ),
    # Ensure wru recognizes the geographic county boundary for the fallback imputation
    county = "12071" # FIPS code for Lee County, FL
  )

nrow(wru_ready_data)
table(wru_ready_data$true_race, useNA = "always")

lee_county_test <- as.data.table(wru_ready_data) %>%
  filter(!is.na(true_race)) %>%
  mutate(
    state = "fl",
    surname = as.character(surname),
    first = as.character(first),
    zcta = as.character(zipcode) # Required column label mapping for ZCTA use
  )

florida_zcta_layers <- wru::get_census_data(
  key = Sys.getenv("CENSUS_API_KEY"),
  state = "FL",
  age = FALSE,
  sex = FALSE,
  census.geo = "zcta",
  county.list = NULL # Required parameter boundary choice for independent ZCTAs
)

predicted_lee_county <- wru::predict_race(
  voter.file = lee_county_test,
  census.surname = TRUE,
  surname.only = FALSE,
  census.geo = "zcta",
  census.data = florida_zcta_layers,
  impute.missing = TRUE, # Forces wru to fall back to County priors if needed
  skip_bad_geos = TRUE
)

# "53233 (10.1%) individuals' last names were not matched, but ZCTA's baseline geographic racial demographics to calculate the prediction.

evaluation_ready_data <- predicted_lee_county %>%
  rowwise() %>%
  mutate(
    # Isolate which of the five columns held the highest posterior probability string
    max_prob_col = c("white", "black", "hispanic", "asian", "other")[
      which.max(c(pred.whi, pred.bla, pred.his, pred.asi, pred.oth))
    ]
  ) %>%
  ungroup() %>%
  # Convert fields into factor arrays to prevent tracking sequence errors
  mutate(
    true_race = factor(true_race, levels = c("white", "black", "hispanic", "asian", "other")),
    predicted_map = factor(max_prob_col, levels = c("white", "black", "hispanic", "asian", "other"))
  )

# Compute the final metric confusion matrix tracking loop
accuracy_report <- confusionMatrix(
  data = evaluation_ready_data$predicted_map,
  reference = evaluation_ready_data$true_race
)

print(accuracy_report$overall["Accuracy"]) # Overall model precision
print(accuracy_report$byClass[, "Balanced Accuracy"]) # Success broken down per group

End

To leave a comment for the author, please follow the link and comment on their blog: Online College Math Teacher.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue reading: Testing racial predictions with BISG

Instacart is running fixed effect regressions with PyFixest!

Alex Fischer — Fri, 10 Jul 2026 22:00:00 +0000

[This article was first published on Alex Fischer, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Instacart logo

Ben Knight from Instacart has published a great write-up on how they use pyfixest “in production” for measuring the effects of marketplace experiments.

For an open source developer, there is of course nothing more rewarding than seeing your software solving a problem in the real world. If you are using PyFixest in your work, please don’t hesitate to send me a note – I am sure it will make my day. =)

You can check out Ben’s article here: Leveraging pyfixest for High-Cardinality Marketplace Modeling at Instacart

To leave a comment for the author, please follow the link and comment on their blog: Alex Fischer.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue reading: Instacart is running fixed effect regressions with PyFixest!

{golem} 1.0.0 is here

Colin Fay — Fri, 10 Jul 2026 13:02:50 +0000

[This article was first published on Rtask, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

You can read the original post in its original format on Rtask website by ThinkR here: {golem} 1.0.0 is here

After years of powering Shiny applications in production, {golem} — our opinionated framework for building production-grade Shiny apps as R packages — has finally reached a symbolic milestone: version 1.0.0. This is more than a version bump. It marks a mature, stable API, and it’s the right moment to clean up a few legacy behaviors along the way.

Here’s what’s new in this release:

Agent skills — the headline feature. {golem} can now install agent skills (Claude Code / AGENTS.md layout) straight into your project, so your coding assistant natively understands golem’s conventions: adding a module, a function, running a check, fixing missing ns() calls… Enable them right at creation with create_golem("myapp", with_agents = TRUE).
A reworked Docker/{renv} deployment story. Multi-stage Dockerfiles by default, production mode enabled out of the box, plus two new helpers — add_github_action() and add_gitlab_ci() — to generate minimal deployment CI for fresh apps.
Modernized console output. Every message, progress bar and feedback line has been standardized with the {cli} package for a cleaner, more consistent experience.
Functional JavaScript bindings out of the box. add_js_input_binding() and add_js_output_binding() now generate working bindings — no more manual skeleton completion — with a ready-to-use R companion file.
Breaking changes. Unified golem_wd path argument, a reworked get_current_config(), a few removed functions and stricter add_*/use_* helpers. Worth reading before you upgrade an existing project.

Getting started (or upgrading) is a one-liner:

install.packages("golem")
golem::create_golem("myapp", with_agents = TRUE)

Read the full announcement on golemverse.org →

This post is better presented on its original ThinkR website here: {golem} 1.0.0 is here

To leave a comment for the author, please follow the link and comment on their blog: Rtask.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue reading: {golem} 1.0.0 is here

How to Put your Course Book Online

Credibly Curious — Thu, 09 Jul 2026 00:00:00 +0000

[This article was first published on Credibly Curious, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I recently gave a talk, The value in teaching is not the content it’s the teacher

My main point in this is:

Your course materials should be out there in public for free online.

To help support this, this blog post goes through the technical details I note in one of my slides: How to Put your Course Book Online.

Summary

Make a quarto book
Add an appropriate license CC BY-NC-4.0
- Can share/adapt, must attribute
- Cannot use materials commercially
Add a README
Put it on github
Have it render as an online book when you make changes

Follow along with github repo: “course-book-template”

I have made a repo on github “course-book-template” that details each of these steps.

Make a quarto book

The quarto docs on starting a book are excellent so I would recommend starting there: https://quarto.org/docs/books/#quick-start.

But essentially, you run:

quarto create project book .

It will then guide you through creating a title, etc

commit of adding the book

Add an appropriate license

It is important to pick a license early. This helps protect your work, and also makes it clear to others how to reference and use your work. Personally, I like CC BY-NC-4.0, this gives you these conditions:

Can share/adapt, must attribute
Cannot use materials commercially

Note that this is different to the very common CC-BY, which does allow commercial usage.

If you aren’t sure about licenses for your purpose, it would be worthwhile checking out the chooser here https://creativecommons.org/chooser/

In using this, I discovered another useful license, CC-BY-NC-SA 4.0 Which builds off of CC-BY-NC, but with one additional clause:

ShareAlike – If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.

This is sometimes known as “copyleft”. This is sometimes seen as too restrictive. Consult with your community about what the standards are. Another useful place to read up on licenses is the “licensing” chapter in the R packages book by Hadley Wickham and Jenny Bryan

I add a LICENSE file, and also a license.qmd chapter, as well as add the license to the README.

commit of adding the license

Using a README

I think it is worthwhile adding a few key sections to your README file:

Details: like the abstract – the hook!
Prerequisites: What do you expect learners to know?
Learning outcomes: What will they walk away knowing?
Schedule: An outline of each hour of learning, optionally with a timetable

commit of adding the README

Put it on online – github

Your course should live somewhere public! You can see for example our course materials here: https://github.com/njtierney/course-book-template

Have it render when you make changes

You can use github actions to render your book. This is really neat, and means your book will be rendered anytime you push changes. It means you don’t need to push HTML, just the quarto files.

Rhere are a few different ways you can manage this, I happen to like using github pages.

There are some really nice instructions on the quarto website on how to set up github pages – https://quarto.org/docs/publishing/github-pages.html#github-action

However, I have found a slightly different setup, which I will share here.

This involves using a DESCRIPTION file to track the R packages that you use. The reason we need to do this is to make sure when we render our book, that all the R packages we need are installed. There are probably other ways around this, and I’d love to hear them, but this is what I have found works.

Here is the first step where I add a dependency, in this case, tidyverse.

commit of adding this tidyverse code

Then add the DESCRIPTION file with:

usethis::use_description(check_name = FALSE)

I then edited mine to look like this:

Package: course-book-template
Title: A book about some things
Version: 0.0.0.9000
Authors@R: 
  c(
  person(
    given = "Nicholas",
    family = "Tierney",
    email = "nicholas.tierney@gmail.com",
    role = c("aut", "cre"),
    comment = c(ORCID = "https://orcid.org/0000-0003-1460-8722")
    )
  )
Description: Course materials for your topic. This should have two sentences.
License: CC-BY-NC 4.0 + file LICENSE
Encoding: UTF-8
Language: en-GB
Roxygen: list(markdown = TRUE)
RoxygenNote: 8.0.0

commit

You can then add your package dependency into Imports or Depends. Which one you use is normally very important for R package development, but the reason we are using a DESCRIPTION file here is to track our dependencies.

usethis::use_package("tidyverse", type = "Depends")

commit

I then add the github actions – you can actually just refer to a file, so this will work:

use_github_action(url = "https://github.com/njtierney/gentlegit/blob/main/.github/workflows/quarto-publish.yml")

This will give you a message like the following:

✔ Creating .github/.
✔ Adding "^\\.github$" to .Rbuildignore.
✔ Adding "*.html" to .github/.gitignore.
✔ Creating .github/workflows/.
✔ Saving
  "njtierney/gentlegit/.github/workflows/quarto-publish.yml@main"
  to .github/workflows/quarto-publish.yml.

commit

Also, probably a good time to add a .gitignore file. This is a good idea to make sure you don’t commit HTML files (they can be really large,a nd we don’t need them), or other file types that might be really large, or have sensitive information in them.

usethis::use_git_ignore("*.pdf")

Will create the file, and tell git to never commit a PDF.

I edit my .gitignore file to look like the following:

/.quarto/
**/*.quarto_ipynb
.Rproj.user
.Rhistory
.RData
.Ruserdata
dev
docs
/.quarto/
*.aux
*.log
*.pdf
*.tex
*.toc
*.rds
*_files
*_cache
*.html
.DS_Store

commit

Once all this is said and done, you will still need to run some commands in your terminal:

quarto publish gh-pages

This should then produce a question like:

nick course-book-template[main] > quarto publish gh-pages
? Publish site to https://njtierney.github.io/course-book-template/ using gh-pages? (Y/n) ›

reply “Y”

You will then get some code that looks like:

Switched to a new branch 'gh-pages'
[gh-pages (root-commit) 8ccc5e0] Initializing gh-pages branch
remote: 
remote: Create a pull request for 'gh-pages' on GitHub by visiting:        
remote:      https://github.com/njtierney/course-book-template/pull/new/gh-pages        
remote: 
To https://github.com/njtierney/course-book-template.git
 * [new branch]      HEAD -> gh-pages
Switched to branch 'main'
Your branch is up to date with 'origin/main'.
From https://github.com/njtierney/course-book-template
 * branch            gh-pages   -> FETCH_HEAD

And then some rendering code that will look like:

Rendering for publish:

[1/4] index.qmd
[2/4] intro.qmd


processing file: intro.qmd
1/3                  
2/3 [unnamed-chunk-1]
3/3                  
output file: intro.knit.md

...

(|) Deploying gh-pages branch to website (this may take a few minutes)

Wait a few minutes, as it asks you.

Then you should see something like:

[✓] Deploying gh-pages branch to website (this may take a few minutes)
[✓] Published to https://njtierney.github.io/course-book-template/

NOTE: GitHub Pages deployments normally take a few minutes (your site updates
will be visible once the deploy completes)

Your website probably won’t be visible just yet, which feels a touch annoying, but you can keep an eye on it on the “actions” tab, e.g., https://github.com/njtierney/course-book-template/actions

Once this has lit green (hopefully it has!)

You should go to your “about” section, and click on the setting cog:

Then tick the box that says “Use your GitHub Pages website”

This adds your GitHub Pages website onto the repo, and it looks pretty neat.

There are more things you can do, like configuring your own custom website instead of using github.

So, instead of https://njtierney.github.io/course-book-template/, you could have: “course-book-template.com”.

And that’s it!

To leave a comment for the author, please follow the link and comment on their blog: Credibly Curious.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue reading: How to Put your Course Book Online

SummaryTables: Publication-Ready Summary Tables for jamovi

["Nour Edin Darwish"] — Thu, 09 Jul 2026 00:00:00 +0000

[This article was first published on jamovi, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

We are excited to introduce SummaryTables, a new module available in the jamovi library designed to provide an easy, flexible, and elegant way to create publication-ready analytical and summary tables.

Powered by the gtsummary package in R, this module bridges the gap between running your analyses and publishing them. Instead of piecing together multiple separate outputs into a final manuscript table, SummaryTables generates beautifully formatted, publication-ready tables directly within jamovi. By making complex analyses simple and accessible, it saves you valuable time and reduces the risk of transcription errors. It summarizes data sets, regression models, and more, using sensible defaults while offering highly customizable capabilities.

Key Features

Here is an overview of the analyses included in the module:

1. Summary Table

Create classic “Table 1” summaries with group comparisons, automatic statistical testing, effect sizes, and p-value adjustments.

2. Continuous Table

Summarize a single continuous variable across multiple categorical variables with automatic statistical testing and p-value adjustments.

3. Cross Table

Summarize the association between two categorical variables using cross-tabulations with automatic statistical testing.

4. Likert Table

Summarize Likert scale items.

5. Survival Table

Create Kaplan-Meier survival tables showing survival probabilities and median survival times.

6. Multivariable Regression

Fit multivariable linear, logistic, or Cox regression models.

7. Univariable Regression

Run separate univariable linear, logistic, or Cox regression models for each predictor, seamlessly combined into a single, clean table.

Additional Features

Save to Word

Bypass the formatting headaches that come with copying and pasting jamovi tables into Word. Save your results directly to a natively formatted Word (.docx) document, preserving the same styling done in jamovi and ensuring a perfect layout for your manuscript.

Journal Formatting & Translations

Choose from 16 supported languages and apply pre-built formatting themes tailored for major journals (JAMA, The Lancet, NEJM, QJE).

Help and Documentation

For documentation, tutorials, and support, please visit: nouredindarwish.github.io/SummaryTables.

To leave a comment for the author, please follow the link and comment on their blog: jamovi.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue reading: SummaryTables: Publication-Ready Summary Tables for jamovi

How to Put your Course Book Online

Blog on Credibly Curious — Thu, 09 Jul 2026 00:00:00 +0000

[This article was first published on Blog on Credibly Curious, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I recently gave a talk, The value in teaching is not the content it’s the teacher

My main point in this is:

Your course materials should be out there in public for free online.

To help support this, this blog post goes through the technical details I note in one of my slides: How to Put your Course Book Online.

Summary

Make a quarto book
Add an appropriate license CC BY-NC-4.0
- Can share/adapt, must attribute
- Cannot use materials commercially
Add a README
Put it on github
Have it render as an online book when you make changes

Follow along with github repo: “course-book-template”

I have made a repo on github “course-book-template” that details each of these steps.

Make a quarto book

The quarto docs on starting a book are excellent so I would recommend starting there: https://quarto.org/docs/books/#quick-start.

But essentially, you run:

quarto create project book .

It will then guide you through creating a title, etc

commit of adding the book

Add an appropriate license

Can share/adapt, must attribute
Cannot use materials commercially

Note that this is different to the very common CC-BY, which does allow commercial usage.

If you aren’t sure about licenses for your purpose, it would be worthwhile checking out the chooser here https://creativecommons.org/chooser/

In using this, I discovered another useful license, CC-BY-NC-SA 4.0 Which builds off of CC-BY-NC, but with one additional clause:

ShareAlike – If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.

I add a LICENSE file, and also a license.qmd chapter, as well as add the license to the README.

commit of adding the license

Using a README

I think it is worthwhile adding a few key sections to your README file:

Details: like the abstract – the hook!
Prerequisites: What do you expect learners to know?
Learning outcomes: What will they walk away knowing?
Schedule: An outline of each hour of learning, optionally with a timetable

commit of adding the README

Put it on online – github

Your course should live somewhere public! You can see for example our course materials here: https://github.com/njtierney/course-book-template

Have it render when you make changes

You can use github actions to render your book. This is really neat, and means your book will be rendered anytime you push changes. It means you don’t need to push HTML, just the quarto files.

Rhere are a few different ways you can manage this, I happen to like using github pages.

There are some really nice instructions on the quarto website on how to set up github pages – https://quarto.org/docs/publishing/github-pages.html#github-action

However, I have found a slightly different setup, which I will share here.

Here is the first step where I add a dependency, in this case, tidyverse.

commit of adding this tidyverse code

Then add the DESCRIPTION file with:

usethis::use_description(check_name = FALSE)

I then edited mine to look like this:

Package: course-book-template
Title: A book about some things
Version: 0.0.0.9000
Authors@R: 
  c(
  person(
    given = "Nicholas",
    family = "Tierney",
    email = "nicholas.tierney@gmail.com",
    role = c("aut", "cre"),
    comment = c(ORCID = "https://orcid.org/0000-0003-1460-8722")
    )
  )
Description: Course materials for your topic. This should have two sentences.
License: CC-BY-NC 4.0 + file LICENSE
Encoding: UTF-8
Language: en-GB
Roxygen: list(markdown = TRUE)
RoxygenNote: 8.0.0

commit

usethis::use_package("tidyverse", type = "Depends")

commit

I then add the github actions – you can actually just refer to a file, so this will work:

use_github_action(url = "https://github.com/njtierney/gentlegit/blob/main/.github/workflows/quarto-publish.yml")

This will give you a message like the following:

✔ Creating .github/.
✔ Adding "^\\.github$" to .Rbuildignore.
✔ Adding "*.html" to .github/.gitignore.
✔ Creating .github/workflows/.
✔ Saving
  "njtierney/gentlegit/.github/workflows/quarto-publish.yml@main"
  to .github/workflows/quarto-publish.yml.

commit

usethis::use_git_ignore("*.pdf")

Will create the file, and tell git to never commit a PDF.

I edit my .gitignore file to look like the following:

/.quarto/
**/*.quarto_ipynb
.Rproj.user
.Rhistory
.RData
.Ruserdata
dev
docs
/.quarto/
*.aux
*.log
*.pdf
*.tex
*.toc
*.rds
*_files
*_cache
*.html
.DS_Store

commit

Once all this is said and done, you will still need to run some commands in your terminal:

quarto publish gh-pages

This should then produce a question like:

nick course-book-template[main] > quarto publish gh-pages
? Publish site to https://njtierney.github.io/course-book-template/ using gh-pages? (Y/n) ›

reply “Y”

You will then get some code that looks like:

Switched to a new branch 'gh-pages'
[gh-pages (root-commit) 8ccc5e0] Initializing gh-pages branch
remote: 
remote: Create a pull request for 'gh-pages' on GitHub by visiting:        
remote:      https://github.com/njtierney/course-book-template/pull/new/gh-pages        
remote: 
To https://github.com/njtierney/course-book-template.git
 * [new branch]      HEAD -> gh-pages
Switched to branch 'main'
Your branch is up to date with 'origin/main'.
From https://github.com/njtierney/course-book-template
 * branch            gh-pages   -> FETCH_HEAD

And then some rendering code that will look like:

Rendering for publish:

[1/4] index.qmd
[2/4] intro.qmd


processing file: intro.qmd
1/3                  
2/3 [unnamed-chunk-1]
3/3                  
output file: intro.knit.md

...

(|) Deploying gh-pages branch to website (this may take a few minutes)

Wait a few minutes, as it asks you.

Then you should see something like:

[✓] Deploying gh-pages branch to website (this may take a few minutes)
[✓] Published to https://njtierney.github.io/course-book-template/

NOTE: GitHub Pages deployments normally take a few minutes (your site updates
will be visible once the deploy completes)

Once this has lit green (hopefully it has!)

You should go to your “about” section, and click on the setting cog:

Then tick the box that says “Use your GitHub Pages website”

This adds your GitHub Pages website onto the repo, and it looks pretty neat.

There are more things you can do, like configuring your own custom website instead of using github.

So, instead of https://njtierney.github.io/course-book-template/, you could have: “course-book-template.com”.

And that’s it!

To leave a comment for the author, please follow the link and comment on their blog: Blog on Credibly Curious.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue reading: How to Put your Course Book Online

Crude oil stocks at Cushing, Oklahoma by @ellis2013nz

free range statistics - R — Wed, 08 Jul 2026 13:00:00 +0000

[This article was first published on free range statistics - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A very short post today. During the global fuel crisis relating to the conflict in Iran, I have been monitoring this chart, every Thursday morning my time:

It shows the amount of crude oil in the Cushing facility in Oklahoma and is one of the most timely and frequently updating (every Wednesday) indicators of the overall health of the crude oil market in the USA. As the chart says, below 20 million barrels is widely cited as problematic, “tank bottom”, or the “operational floor”. Below this level is expected to cause risks to oil quality, to the ability of infrastructure to move oil around, and to service the commodities markets that rely on this facility for actual delivery.

We’ve been under 20 million for a few weeks now and yet the oil markets in the USA have not blown up or ground to a halt (choose your metaphor), so clearly there is a bit of slack built into that “minimum”. Perhaps the real minimum is 18 million, 17 million, who knows? But it can’t be far below the current level and this is clearly worth monitoring.

There is of course a web page where you can check the data directly, but naturally I wanted to draw my own polished chart and label it appropriately. The code below runs a complete pipeline to download the data, process it (including preparing some smart uncluttered automatic labelling of points) and drawing the plot.

library(tidyverse)
library(scales)
library(readxl)
library(ggrepel)
library(glue)

# Download data
download.file("https://www.eia.gov/dnav/pet/hist_xls/W_EPC0_SAX_YCUOK_MBBLw.xls", 
              mode = "wb", destfile = "cushing.xls")

# Import and process data
cushing <- read_excel("cushing.xls", sheet = "Data 1", skip = 2) |> 
      rename(value = `Weekly Cushing, OK Ending Stocks excluding SPR of Crude Oil  (Thousand Barrels)`,
             end_date = Date) |> 
      # we want a label for a point only if it is at least 500 different from
      # the subsequent point (this is to avoid clutter when the line is
      # basically horizontal)
      mutate(label = ifelse(is.na(lead(value)) | 
                            abs(value - lead(value)) > 500, 
                            comma(value), ""),
      # We also want the label to disappear if the value is really close to
      # 20,000, which is going to be a clearly labelled line anyway so would
      # just be unnecessary clutter.
             label = ifelse(abs(value - 20000) < 200, "", label))

# Draw chart
cushing |> 
    filter(end_date > as.Date("2025-12-31")) |>
    # original data was in thousands but it's better to have it in millioms
    # visually:
    ggplot(aes(x = end_date, y = value / 1000)) +
    geom_hline(yintercept = 20, colour = "darkred") +
    geom_line(colour = "steelblue") +
    geom_point(colour = "steelblue") +
    geom_text(data = filter(cushing, end_date > as.Date("2026-05-01")), 
              aes(label = label, x = end_date + 150000), 
              size = 2.8, hjust = 0) +
    annotate("text", x = as.Date("2026-03-02"), y = 20.500, 
             label = "Widely cited minimum working level - 20 million barrels", 
             colour = "darkred") +
    scale_x_date(date_breaks = "1 month", date_labels = "%b") + 
    scale_y_continuous(label = comma) +
    theme(panel.grid.minor = element_blank()) +
    labs(x = "Month (2026)",
         y = "Million barrels",
         title = "Stocks of crude oil at Cushing, Oklahoma",
         subtitle = "Cushing is the main US crude oil storage and pipeline hub, and the delivery point for the West Texas Intermediary (WTI) oil benchmark.",
         caption = glue("Source: US Energy Information Administration (EIA) https://www.eia.gov/dnav/pet/hist/LeafHandler.ashx?n=PET&s=W_EPC0_SAX_YCUOK_MBBL&f=W. Accessed {Sys.Date()}."))

To leave a comment for the author, please follow the link and comment on their blog: free range statistics - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue reading: Crude oil stocks at Cushing, Oklahoma by @ellis2013nz

An API for Everything There Is to Know About Packages

rOpenSci — Wed, 08 Jul 2026 00:00:00 +0000

[This article was first published on rOpenSci - open tools for open science, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

On R-Universe you can discover and learn everything there is to know about R packages. But did you know it does only provide a human-friendly website, but also programmatic access to all information through APIs! Thanks to those APIs you can list universes, list packages in an universe, get information on packages, and perform searches; all without any need for authentication.

The R-Universe APIs are both handy and reliable. You can build upon them, as both rOpenSci and community members have done. This post shows some examples of use cases with the R-Universe API.

Know which packages are yours

The toolbox for rOpenSci community management tasks, promoutils, calls an R-Universe API to list and get information on rOpenSci packages. This list of packages is in particular used in the function that outputs data on help-wanted issues.

Display packages developed at your organization

If your organization develops R packages, you can use its R-Universe as the source of truth for your package collection. For instance, the R-Universe of ggsegverse corresponds to, well, the ggsegverse. Therefore, on the ggsegverse website, the listing of packages is created by querying https://ggsegverse.r-universe.dev/api/packages, the endpoint for information on all packages in the universe.

Likewise, the docs page of ggsegverse relies on the R-Universe API to retrieve links to vignettes for each package.

Server-side or client-side API requests

The ggsegverse website performs client-side requests: it queries the R-Universe API when you open the webpage, through a JS script. You can also check this through the web developer tools:

The rendering of the nice package cards happens through another JS script that uses information such as the package’s title, description, number of stars, etc.

In contrast, for package listings on author and package pages, the rOpenSci website uses server-side requests: the API is called when Hugo renders our website. For instance, for the list of packages at the bottom of Jeroen Ooms’ author page, we query the R-Universe API link from our website configuration and filter the packages by GitHub login.

On a Quarto website, you could have a code chunk using R to query the R-Universe APIs, through the universe R package for instance.

In any case, if you use server-side requests, you need to re-render your website regularly to avoid your packages’ listing to get out-of-date. The nice thing about client-side requests is that the package lists will be always up-to-date!

Search Packages

With R-universe, you can search packages, not only through the web interface but also through an API.

The project The Wharehouse, that helps users find packages according to given keywords, uses R-universe as one its information sources.

Conclusion

In this post we provided some use cases of the R-Universe APIs. Try them out, or read the docs to get a sense of all the information that’s shared through the different endpoints. If you maintain some infrastructure that uses an R-Universe API, feel free to report a use case!

To leave a comment for the author, please follow the link and comment on their blog: rOpenSci - open tools for open science.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue reading: An API for Everything There Is to Know About Packages

Close Enough? Using the WGI as a Proxy for the WJP Rule of Law Index

Giles Dickenson-Jones — Mon, 06 Jul 2026 23:13:14 +0000

[This article was first published on Data Analytics and AI Archives - Giles, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

TLDR: this post tests whether the Worldwide Governance Indicators’ (WGI) rule of law measure provides a reasonable proxy for the World Justice Project’s (WJP) Rule of Law Index, such as when analysis might benefit from the WGI’s availability for more countries over a longer period. Results suggest the two measures broadly agree with one another when making cross-country comparisons, but care is warranted for country-level estimates where it’s likely to be more difficult to differentiate measurement noise from genuine changes in the quality of institutions.

This is the first post in a series examining relationships between the rule of law and economic and social outcomes.

In the latter half of 2025 I was engaged by the Bingham Centre for the Rule of Law and The Law Society of England and Wales to facilitate internal discussions on the links between economic growth and the rule of law. Being subject to the Chatham House Rule means much of my work won’t see the light of day, but for those interested in the topic I’d recommend taking a look at Dr Lopez-Gomez’s literature review which provides a good summary of the issues discussed.

As somebody that dedicates a lot of time to conducting cross-country comparisons and using statistics to measure amorphous concepts, I started work on the project by identifying approaches for measuring the rule of law (RoL), to help anchor analysis and discussions to something tangible definition that was rigorous enough to satisfy a room full of lawyers, without offending institutional economists.

This led me to two composite measures:

Like all indices, both are constructed by collecting an assortment of indicators that are thought to proxy something important that can’t be easily be observed or quantified. A person’s performance at work is a good example of this: as even when we might know what a good employee looks it can be hard to explicitly define and quantify. Which often results in performance being proxied through some combination of vague indicators like judgement, communication and sticktoitiveness.

Composite indices work in a similar way, except that it’s often harder to validate whether they’re measuring anything useful due to the guiding concept itself being vague and divorced from our day-to-day experience. Unsurprisingly, this has resulted in the birth of a lot of terrible indices whose sole purpose appears to be to fabricate quantitative evidence for whatever concept they’re pushing, which is likely what motivated the OECD to develop this handbook on index design and construction.

But, there are also good examples out there too. Measures that earnestly attempt to define and quantify a complex phenomena while transparently admitting the inherent limitations and uncertainties of trying to do so in the first place. I’d argue both the WGI and WJP’s rule of law measures sit in this camp, with both making their data freely available and publishing statistical audits of their methodologies (see here for the WJP and here for the WGI).

Worldwide Governance Indicators: Rule of Law

The World Bank’s WGI is intended to provide a broad measure of how well a country is governed across six dimensions, including political stability, government effectiveness and the rule of law. Indicators are selected based on a set of screen criteria and their relevance to measuring a concept of interest. For instance, the WGI’s Rule of Law indicator intends to capture:

“…perceptions of the extent to which agents respect and follow the rules of society, including contract enforcement, property rights, the police, courts, and the likelihood of crime and violence.“

Indicators used to measure the rule of law include survey-based measures of crime, violence and public trust in the justice system and relevant composite indicators, such as PRS’s ‘Contract Viability’ measure.

World Justice Project: Rule of Law Index

The WJP’s index aims to provide a comprehensive picture of characteristics relevant to the rule of law, which they define as:

“…a durable system of laws, institutions, norms, and community commitment that delivers four universal principles: accountability, just law, open government, and accessible and impartial justice.”

This is done by organizing survey-based measures organized across eight factors thought to describe the relationship between the state and civil society, which are detailed in their methodology paper:

Constraints on Government Powers: the extent to which those who govern are bound by law.
Absence of Corruption: the absence of corruption in government
Open Government: the extent to which a government shares information, empowers people with tools to hold the government accountable, and fosters citizen participation in public policy deliberations.
Fundamental Rights: measures adherence to a menu of rights firmly established under the United Nations Universal Declaration of Human Rights.
Order and Security: measures how well a society ensures the security of persons and property.
Regulatory Enforcement: measures the extent to which regulations are fairly and effectively implemented and enforced.
Civil Justice: measures whether ordinary people can resolve their grievances peacefully and effectively through the civil justice system.
Criminal Justice: rates the effectiveness of the criminal justice system as a mechanism to redress grievances and bring action against individuals for offenses against society.

Why not both?

The WJP gets one of the fundamentals right of good index design: they’ve actually thought carefully about how to conceptualize and define what they’re trying to measure, providing a richer picture of different characteristics thought to be associated with the rule of law.

However, the WJP’s measure also comes with two important weaknesses for the purpose I had in mind: It covers less countries over a shorter time period than the WGI, which might matter when conducting analysis on slow moving institutions and outcomes. And, if presented in isolation, the WJP’s measure might be taken to imply that the RoL is both a necessary and sufficient condition for driving outcomes like economic growth, while the WGI presents the RoL as part of a collection of mutually supporting institutions.

In an attempt to have the best of both worlds, my first question was therefore whether it was possible to use both measures at the same time: the WJP’s measure when a more nuanced and explicit discussion of the RoL is required; and the WGI for longer-term analysis and communicating how the RoL might sit within a wider portfolio of institutions.

Conceptual and practical overlap

At the outside, aside from overlapping conceptually, the WJP and WGI also share common data sources: with five of six dimensions from the WGI sourcing data from the WJP’, including:

Voice and Accountability: includes data from Factor 1, 3 and 4.
Political Stability: leverages data from Factor 5.2
Regulatory Quality: includes factor 6.
Rule of Law: uses data from factors 1, 4, 5, 7 and 8.
Control of Corruption: incorporates data from factor 2.

The upshot of this is that any observed relationship identified between the WJP and WGI might just reflect overlapping ingredients, rather than some deeper conceptual agreement on the rule of law. However, I’m largely going to ignore this in my analysis as:

Before being included in the WGI, indicators from the same source are averaged to produce a single source–dimension input, which is likely to limit the WJP’s influence; and
Sensitivity analysis reported in the 2025 WGI methodology update indicated that their estimates were largely insensitive to changes in weightings and the exclusion of individual indicators.

Making it probable that the bulk of any observed relationships are genuine, rather than being statistical truisms.

Project Setup and Data

Data used in this post can be downloaded here for the WGI and here for the WJP’s RoL index. These datasets were current as of July 2025, but publishers frequently update their results over time as the source data and methodology evolves.

#load the packages we'll probably need
library(tidyverse)
library(readxl)
library(janitor)
library(countrycode)

#import WGI data
dta_wgi_2025<-read_excel("./Data/wgidataset_with_sourcedata-2025.xlsx",
                         sheet="rl") |> 
  clean_names()

#import World Justice Project RoL data
dta_wjp_rol<-read_excel('./Data/2025_wjp_rule_of_law_index_HISTORICAL_DATA_FILE.xlsx', sheet="Historical Data")|> 
  clean_names()

Data Cleaning

Much of the data cleaning below relates to improving how variables are named and ensuring a standard set of country and region names are applied across indices. I’ve retained some indicators that aren’t used in this post as they’ll be drawn on later in the series.

#standardize column names
dta_wgi_2025<-dta_wgi_2025 |> 
  rename(country=economy_name,
         iso3c=economy_code,
         wgi_rol=governance_estimate_approx_2_5_to_2_5)

dta_wjp_rol<-dta_wjp_rol |> 
  rename(iso3c=country_code) |> 
  rename_with(~ str_replace(., "^x", "factor_"), starts_with("x"))

#change wjp's year variable to YYYY format and convert to numeric 
#(adopts the first 4 digit year when in YYYY-YYYY format)
dta_wjp_rol <- dta_wjp_rol |> 
  mutate(year = as.numeric(str_sub(year, 1, 4)))

#cold-heartedly drop columns I'm not interested in 
dta_wgi_2025<-dta_wgi_2025 |> 
  select(iso3c, income_classification, year,wgi_rol)
#drop country and region name labels so these can be standardized 
dta_wjp_rol<-dta_wjp_rol |> 
  select(-country_year,-country,-region) |> 
  rename(wjp_rol=wjp_rule_of_law_index_overall_score)


#merge dataframes
dta_rol_unified<-left_join(dta_wgi_2025,
                           dta_wjp_rol,
                           by = join_by(year, iso3c), 
                           keep=FALSE) 

#add standardized and region names country names

#define country code assignments for legacy / ambigious codes
#(Note: matches devised by Claude) 
ref_iso3c_custom_names <- c(ADO = "Andorra",
                            ANT = "Netherlands Antilles",
                            PRI = "Puerto Rico",
                            REU = "Réunion",
                            XKX = "Kosovo")

ref_iso3c_custom_regions <- c(ADO = "Europe & Central Asia",
                              ANT = "Latin America & Caribbean",
                              PRI = "Latin America & Caribbean",
                              REU = "Sub-Saharan Africa",
                              XKX = "Europe & Central Asia")

#assign country names and regions:
dta_rol_unified<-dta_rol_unified |> 
  mutate(country_name=countrycode(iso3c, 
                                  origin='iso3c',
                                  destination = 'country.name.en',    
                                  custom_match = ref_iso3c_custom_names),
         region=countrycode(iso3c, 
                            origin='iso3c',
                            destination = 'region',    
                            custom_match =ref_iso3c_custom_regions))

Comparing Coverage

Before examining the crossover of either index, it’s probably a good idea to check what’s not covered by either source. I rely on iso3c codes to do this as in my experience they tend to be specified more consistently than country names.

Overall, a little over 73 percent of countries listed in the countrycode package are covered by either index. A cursory glance of ref_missing_iso3c indicates most of the missing countries don’t have an iso3c code, no longer exist, are territories of a larger country (or claimed as one) and/or have relatively small populations. I’m also willing to take the WJP’s word for it when they say their index covers more than ninety percent of the worlds population.

#How many countries exist in either dataset
table(codelist$iso3c %in% dta_rol_unified$iso3c) |> 
  prop.table() |> 
  round(2)

#create listing of countries missing from either index 
#(+ unecessarily using %notin% to undermine compatibility with R versions pre V4.6)
ref_missing_iso3c<-codelist[(codelist$iso3c %notin% dta_rol_unified$iso3c),] |> 
  select(iso3c, country.name.en)

#create a summary of country coverage over time
sum_rol_index_coverage_overall<- dta_rol_unified |>
  group_by(year) |>
  summarize(wjp = sum(!is.na(wjp_rol)),
            wgi = sum(!is.na(wgi_rol)),
            .groups = "drop")

Within the set of countries that are covered by be either index, the WGI provides estimates for more countries over a longer time period. Specifically, WGI RoL estimates are available from 1996 for 199 countries whereas the WJP’s measure starts in 2012 (2012-13) for 97 countries.

Although coverage has increased for both indices, it has increased more dramatically for the WJP over the period (46% compared to 8% for the WGI). The dumbbell plot below illustrates how the WJP’s coverage has increased relative to the WGI by region. Highlighting that the WJP’s coverage improvements have been heavily concentrated in Latin America and the Caribbean, and the Middle East and North Africa.

*North America’s coverage has remained the same over the period.

# compare coverage: WJP as % of WGI at index start (2012) vs end (2024)
sum_rol_index_coverage_by_region <- dta_rol_unified |>
  group_by(region, year) |>
  summarize(wjp = sum(!is.na(wjp_rol)),
            wgi = sum(!is.na(wgi_rol)),
            .groups = "drop") |>
  filter(year %in% c(2012, 2024)) |>
  mutate(wjp_pct_of_wgi = 100 * wjp / wgi) |>
  select(region, year, wjp_pct_of_wgi) |>
  pivot_wider(names_from = year, values_from = wjp_pct_of_wgi,
              names_prefix = "yr_") |>
  mutate(region = reorder(region, yr_2024))  

plt_rol_index_coverage <- ggplot(sum_rol_index_coverage_by_region, aes(y = reorder(region,yr_2024) )) +
  geom_segment(aes(x = yr_2012, xend = yr_2024, yend = region),
               colour = "grey75", linewidth = 1) +
  geom_point(aes(x = yr_2012, colour = "2012"), size = 5, alpha=0.7) +
  geom_point(aes(x = yr_2024, colour = "2024"), size = 5, alpha=0.7) +
  scale_colour_manual(values = c("2012" = "#E69F00", "2024" = "#0072B2")) +
  scale_x_continuous(labels = function(x) paste0(x, "%")) +
  labs(x = "Country Coverage (% of WGI)", y = NULL, colour = NULL,
       title = "WJP Rule of Law index coverage by Region") +
  theme_classic() +
  theme(legend.position = "top",
        panel.grid.major.y = element_line(colour = "grey92"))

plt_rol_index_coverage

Cross-country agreement between the indices

When examined globally, both measures exhibit strong agreement with one another. Suggesting that when a country achieves a poor (or strong) score on the WJP’s index they probably will on the WGI RoL measure too.

#association between the two measures - global 
plt_rol_index_comparison_global <- dta_rol_unified |> 
  filter(!is.na(wjp_rol)) |> 
  ggplot(aes(x = wgi_rol, y = wjp_rol)) +
  geom_point(aes(col = region), alpha = 0.5) +
  geom_smooth(method = "lm", col = "black", se = FALSE) +
  labs(x = "WGI rule of law index", y = "WJP rule of law index", col = "Region") +
  theme_classic()

plt_rol_index_comparison_global

Regional agreement between the indices

This observation holds up when split by region. In fact when calculating regional correlations between the measures by year and region in dta_rol_cor_by_year_and_region the median correlation observed is 98%, with the lowest being 82% for MENA in 2019 and South Asia in 2012.

#association between the two measures - global 
plt_rol_index_comparison_regional <- dta_rol_unified |> 
  filter(!is.na(wjp_rol)) |> 
  ggplot(aes(x = wgi_rol, y = wjp_rol)) +
  geom_point(aes(col = region), alpha = 0.5) +
  geom_smooth(method = "lm", col = "black", se = FALSE) +
  labs(x = "WGI rule of law index", y = "WJP rule of law index", col = "Region") +
  theme_classic()+
  facet_wrap(region ~. )

plt_rol_index_comparison_regional

# Correlation between WGI and WJP rule-of-law measures, by year -----------
dta_rol_cor_by_year_and_region <- dta_rol_unified |> 
  filter(!is.na(wgi_rol), !is.na(wjp_rol)) |> 
  group_by(year, region) |> 
  summarise(
    cor = cor(wgi_rol, wjp_rol))

#median correlation
median(dta_rol_cor_by_year_and_region$cor)

Agreement by country

Although agreement between the two measures appears less compelling at the country level, much of this likely stems from the rule of law being sticky and changing little from year to year; which results in measurement noise frequently dominating the underlying signal. This intuitive explanation appears to be largely supported by the scatter plot below, which suggests negative and weaker correlations are more likely for countries experiencing less within-country variation (aka where measurement noise is likely to dominate). The practical implication is that while caution is warranted at the country level, the two measures have a tendency to agree with one another when the observed changes are large enough not to be dominated by measurement noise.

#create list of countries with low number of observations
ref_low_n_obs<-dta_rol_unified |> 
  filter(!is.na(wgi_rol), !is.na(wjp_rol)) |> 
  group_by(country_name) |> 
  summarise(n_obs     = n()) |> 
  filter(n_obs<5) |>
  select(country_name) |> 
  unlist()
 
#calculate within-country assocation between WGI and WJI  
sum_rol_cor_country <- dta_rol_unified |> 
  filter(!is.na(wgi_rol), !is.na(wjp_rol),
         country_name %notin% ref_low_n_obs) |> 
  summarise(
    n_obs     = n(),
    signal_sd = min(sd(wgi_rol), sd(wjp_rol)),  
    cor       = cor(wgi_rol, wjp_rol),
    .by = country_name) |> 
  arrange(cor)

#plot correlation coefficient against standard deviation of either measure
plt_cor_vs_signal <- ggplot(sum_rol_cor_country, aes(signal_sd, cor)) +
  geom_hline(yintercept = 0, linewidth = 0.3, col = "grey70") +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "loess", se = FALSE, col = "grey40", linewidth = 0.5) +
  labs(x = "Within-country signal (min SD of the two indices)",
       y = "WGI–WJP within-country correlation") +
  theme_classic()
plt_cor_vs_signal

First Difference Agreement

As a final test the code below transforms each index into its first difference to explore whether year to year movements in the two measures agree with one another. Because taking the first difference removes country levels, this can be a helpful way to examine whether the two series reliably move together, but it also means measurement errors are more dominant, which can result in a noisier signal.

The good news is the first difference estimates provide solid support for the two measures being reliably linked, with a pooled correlation coefficient of 0.36 and observed positive associations across each region. Providing further support that the WGI’s RoL measure can act as an imperfect and/or imprecise general proxy for the WJP’s aggregate RoL measure, but not a perfect substitute.

# First-difference agreement between WGI and WJP rule-of-law measures ------
dta_rol_diff <- dta_rol_unified |> 
  filter(!is.na(wgi_rol), !is.na(wjp_rol)) |>
  arrange(country_name, year) |> 
  mutate(
    wgi_chg = wgi_rol - lag(wgi_rol, order_by = year),
    wjp_chg = wjp_rol - lag(wjp_rol, order_by = year),
    yr_gap  = year - lag(year, order_by = year), 
    .by='country_name') |> 
  #drop differences calculated over more than one year
    filter(yr_gap == 1)                    

#calculate pooled correlation across regions 
sum_rol_diff_cor <- cor(dta_rol_diff$wgi_chg, dta_rol_diff$wjp_chg)

#produce plot
plt_rol_diff <- ggplot(dta_rol_diff, aes(wgi_chg, wjp_chg)) +
  geom_hline(yintercept = 0, linewidth = 0.3, col = "grey80") +
  geom_vline(xintercept = 0, linewidth = 0.3, col = "grey80") +
  geom_point(alpha = 0.4, size = 0.9,aes(col = region)) +
  geom_smooth(method = "lm", se = FALSE, col = "grey30", linewidth = 0.5) +
  labs(x = "Δ WGI rule of law (year-on-year)",
       y = "Δ WJP rule of law (year-on-year)",
       subtitle = paste0("Pooled correlation of changes: r = ", round(sum_rol_diff_cor, 2))) +
  theme_classic()+
  facet_wrap(region ~. )

plt_rol_diff

Buried Lede

When I asked Claude to critique this post, it said that If the intended use is ranking/trend-tracking, the correlational case is fine and you can say so explicitly. If it’s cardinal substitution, a between-country prediction-error test is needed. I told Claude that this is a ridiculous idea and that I would cancel my subscription if it didn’t return to being sycophantic. And while Claude agreed that it was misguided in questioning my judgement, my reasoning for not including this analysis in the post is that nobody should be expecting this level of precision when using composite indices. Particularly not when they’re trying to pin down something as complex as the rule of law and governance.

This is because no index can perfectly capture and/or quantifiy something as amorphous as the rule of law. But composite indices don’t always need to. For the purpose of telling a broad story about how the rule of law varies across countries and shifts over time, the WGI and WJP measures point in the same general direction, which makes the WGI a serviceable stand-in where the WJP’s shorter record falls short.

How AI was used for this post: Aside from tilting at windmills, I also used Claude to suggest refinements to my code, the title of my post and prose. But, as I’m sure is attested by the errant grammatical and spelling errors, the majority of this post was written by me.

The post Close Enough? Using the WGI as a Proxy for the WJP Rule of Law Index appeared first on Giles.

To leave a comment for the author, please follow the link and comment on their blog: Data Analytics and AI Archives - Giles.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue reading: Close Enough? Using the WGI as a Proxy for the WJP Rule of Law Index

Understanding Tail Analysis in Financial Markets

Selcuk Disci — Sat, 04 Jul 2026 12:43:57 +0000

[This article was first published on DataGeeek, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In financial markets, distinguishing between information-driven movements and liquidity-driven shocks is critical. The reference study we based our work on highlights the importance of tail analysis: comparing Gaussian (thin-tailed) and Student‑t (fat-tailed) distributions to understand whether price changes are more likely to reflect genuine information or temporary liquidity imbalances.

Financial returns are rarely as well‑behaved as the Gaussian (normal) distribution assumes. In theory, extreme price movements should be exceedingly rare under a thin‑tailed Gaussian model. Yet in practice, markets frequently exhibit fat tails: large jumps, crashes, and spikes that occur far more often than Gaussian theory predicts.

This discrepancy motivates tail analysis—a statistical approach that compares how well different distributions explain the observed data. Two common candidates are:

Gaussian distribution (thin tails): If returns fit this model better, extreme movements are interpreted as information‑driven. In other words, new information has entered the market, and price changes are more likely to reflect genuine shifts in fundamentals or expectations.
Student‑t distribution (fat tails): If returns fit this model better, extreme movements are considered liquidity‑driven. These shocks often arise from temporary imbalances in order flow or liquidity constraints, and prices tend to revert once the imbalance subsides.

By comparing the log‑likelihoods of Gaussian and Student‑t fits, we can classify market behavior into these two regimes. This classification is not merely academic: it helps traders, risk managers, and analysts distinguish between trend continuation (information‑driven) and mean reversion (liquidity‑driven).

In our workflow, we apply this tail analysis to gold futures (GC=F) over the past 15 trading days. We compute log returns, fit both distributions, and compare their likelihoods. We then enrich the analysis with a volume impact metric, which highlights whether abnormal trading activity amplifies price changes. Finally, we present the results in a color‑coded audit table that makes tail behavior visually interpretable.

Why These R Packages?

tidyverse: Provides a consistent grammar for data manipulation (mutate, drop_na, select). It ensures reproducibility and readability when transforming raw market data into log returns and derived metrics.
tidyquant: Bridges financial data sources with the tidyverse ecosystem. We use it to fetch gold futures data (GC=F) directly from Yahoo Finance, making the workflow self-contained and easy to extend to other tickers.
MASS: Offers statistical tools for distribution fitting. We rely on fitdistr() to estimate parameters for both Gaussian and Student‑t distributions, enabling a direct comparison of log‑likelihoods.
gt: Provides professional table rendering. It allows us to format numbers, apply color scales, and highlight audit warnings, turning raw statistical output into a visually interpretable audit table.

library(tidyverse)   # Load tidyverse for data manipulation
library(tidyquant)   # Load tidyquant for financial data retrieval
library(MASS)        # Load MASS for distribution fitting
library(gt)          # Load gt for table rendering

ticker <- "GC=F"     # Define the ticker symbol (Gold Futures)
horizon <- 15        # Set horizon to last 15 days

# Fetch market data for the chosen ticker and horizon
market_data <- tq_get(ticker, from = Sys.Date() - horizon, to = Sys.Date())

# Compute log returns and drop missing values
market_tbl <- market_data %>%
  mutate(returns = log(adjusted) - log(lag(adjusted))) %>%
  drop_na()

# Gaussian fit
fit_gauss <- fitdistr(market_tbl$returns, densfun = "normal")

# Student-t fit
fit_t <- fitdistr(
  market_tbl$returns,
  densfun = function(x, df, mean, sd) dt((x - mean)/sd, df)/sd,
  start = list(df = 5, mean = mean(market_tbl$returns), sd = sd(market_tbl$returns))
)

# Compare log-likelihoods
ll_gauss <- fit_gauss$loglik
ll_t <- fit_t$loglik
signal <- if (ll_gauss > ll_t) "INFO-DRIVEN" else "LIQUIDITY-DRIVEN"

# Build audit table
audit_tbl <- market_tbl %>%
  mutate(
    Gaussian_Density = dnorm(returns, mean = mean(returns), sd = sd(returns)),
    StudentT_Density = dt((returns - mean(returns))/sd(returns), df = 5)/sd(returns),
    Volume_Impact = abs(volume)^ifelse(signal == "INFO-DRIVEN", 1.0, 0.6),
    Audit_Warning = signal
  ) %>%
  dplyr::select(Date = date,
                Price = adjusted,
                Gaussian_Density,
                StudentT_Density,
                Volume_Impact,
                Audit_Warning)


#GT Table
audit_gt <- audit_tbl %>%
  gt() %>%
  tab_header(title = md("**Tail Analysis-Based Audit Table**")) %>%
  cols_label(
    Date = md("**Date**"),
    Price = md("**Price**"),
    Gaussian_Density = md("**Gaussian Density**"),
    StudentT_Density = md("**Student-t Density**"),
    Volume_Impact = md("**Volume Impact**"),
    Audit_Warning = md("**Audit Warning**")
  ) %>%
  fmt_number(columns = c(Price, Gaussian_Density, StudentT_Density, Volume_Impact),
             decimals = 2, use_seps = TRUE) %>%
  data_color(
    columns = c(Price),
    colors = scales::col_numeric(
      palette = c("lightgreen","darkgreen"),
      domain = range(audit_tbl$Price, na.rm = TRUE)
    )
  ) %>%
  data_color(
    columns = c(Gaussian_Density, StudentT_Density),
    colors = scales::col_numeric(
      palette = c("lightblue","darkblue"),
      domain = range(c(audit_tbl$Gaussian_Density,
                       audit_tbl$StudentT_Density), na.rm = TRUE)
    )
  ) %>%
  data_color(
    columns = c(Volume_Impact),
    colors = scales::col_numeric(
      palette = c("pink","red"),
      domain = c(min(audit_tbl$Volume_Impact, na.rm = TRUE),
                 max(audit_tbl$Volume_Impact, na.rm = TRUE))
    )
  ) %>%
  text_transform(
    locations = cells_body(columns = vars(Audit_Warning)),
    fn = function(x) {
      ifelse(x == "INFO-DRIVEN",
             "INFO-DRIVEN",
             "LIQUIDITY-DRIVEN")
    }
  )

audit_gt

To leave a comment for the author, please follow the link and comment on their blog: DataGeeek.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue reading: Understanding Tail Analysis in Financial Markets

Rethinking Validation for Spatial Machine Learning: Takeaways from the Talk

Jakub Nowosad — Fri, 03 Jul 2026 00:00:00 +0000

[This article was first published on Thinking in spatial patterns, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Title slide of the talk

Keynote slides: https://jakubnowosad.com/ml4eo2026/

Workshop materials: https://jakubnowosad.com/ml4eo2026workshop/

Machine learning is now deeply embedded¹ in Earth observation workflows, from mapping current environmental conditions to forecasting future change. However, the quality of a spatial prediction map cannot be judged only by how well a model performs on a convenient test sample. In spatial problems, the gap between where we have observations and where we want to make predictions is often a crucial factor in determining whether a model can be trusted.

At the Machine Learning for Earth Observation 2026 conference in Exeter², I gave a keynote talk entitled Rethinking Validation for Spatial Machine Learning (June 22, 2026). The next day, I showed some practical ways to implement these ideas in a workshop called Where your models can be trusted: evaluating spatial machine learning reliably (June 23, 2026). Both focused on the same general question: how can we evaluate spatial machine learning in a way that reflects the actual prediction task?

The keynote was structured around three assumptions that are easy to make, but often unsafe in spatial prediction:

We can predict everywhere. In practice, we validate where we have data, but predict in places that may be poorly represented by the training sample. Tools such as Area of Applicability (AoA) and Local Point Density (LPD) help identify parts of the prediction domain where environmental conditions are more or less supported by the available data.
There is one “correct” validation approach. In reality, validation should follow the prediction task. Random cross-validation can be too optimistic when observations are spatially clustered, while spatial cross-validation can be too pessimistic when the intended prediction scenario is closer to interpolation. Adaptive strategies such as k-Nearest Neighbor Distance Matching (kNNDM) try to align validation folds with the distance structure of the prediction domain.
All validation points are equal. Prediction conditions are not equally common across a map, so a single unweighted average error can misrepresent the expected performance over the full prediction domain. This motivates thinking about how validation samples should be weighted by their prevalence in the places where predictions will be used.

Together, these points lead to the idea of prediction-domain adaptive evaluation: first define the prediction domain, then construct validation folds that reflect it, and finally summarize performance in a way that accounts for how common different prediction conditions are. This is not a complete theory of spatial machine learning evaluation, but it is a useful step away from treating validation as a model-only problem. (To learn more about these ideas, read our preprint: https://arxiv.org/abs/2605.13689.)

The workshop turned these ideas into practical R workflows. Using synthetic and real-world-inspired examples, we used and discussed techniques for Area of Applicability, Local Point Density, compared random cross-validation, spatial cross-validation, and kNNDM cross-validation, and looked at error profiles. The hands-on materials also include exercises, where participants can compare validation strategies, map areas of applicability, and explore how expected error varies across space.

The main takeaway is simple: for spatial machine learning, the question is not only How accurate is the model? It is also Where can the model be trusted?

Footnotes

And embeddings are too, but that’s a story for another day︎
Many thanks to the organizers for inviting me to speak and for hosting a great event! The next edition of the conference will be in Exeter again in June 2027, and I highly recommend it to anyone interested in (broad) spatial machine learning.︎

Citation

BibTeX citation:

@online{nowosad2026,
  author = {Nowosad, Jakub},
  title = {Rethinking {Validation} for {Spatial} {Machine} {Learning:}
    {Takeaways} from the {Talk}},
  date = {2026-07-03},
  url = {https://jakubnowosad.com/posts/2026-07-03-ml4eo/},
  langid = {en}
}

For attribution, please cite this work as:

Nowosad, Jakub. 2026. “Rethinking Validation for Spatial Machine Learning: Takeaways from the Talk.” July 3. https://jakubnowosad.com/posts/2026-07-03-ml4eo/.

To leave a comment for the author, please follow the link and comment on their blog: Thinking in spatial patterns.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue reading: Rethinking Validation for Spatial Machine Learning: Takeaways from the Talk

FOSS Tools for Lazy Editors

rOpenSci — Thu, 02 Jul 2026 00:00:00 +0000

[This article was first published on rOpenSci - open tools for open science, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I recently had the opportunity to learn what the term “Nerd Sniping” meant. Maëlle pointed out a conversation on the rOpenSci Slack about something called Vale, meant for text linting. I’d seen the comment, but honestly hadn’t really understood what it was all about until Maëlle asked if I thought it’d be useful for editing the blog…

…time passes…

About three days later, I’ve hardly finished any of the blog post reviews I was planning to do. I’ve been sucked down a rabbit hole of Vale setup, custom rules, and overrides.

It turns out that “Nerd Sniping” refers to the practice of throwing problems at nerds that distract them from what they were doing.

xkcd Nerd Sniping

That being said, it was a glorious hole to fall down! It was just too bad that Maëlle sniped me two more times by asking me about my spell check setup in Positron and then by asking if Panache would help with translations.

I was pretty slow at my editorial duties that week! But I did come out of the dive with a great editorial setup which will definitely save me time in future.

I’ve edited a lot of posts on the rOpenSci blog. I take pride in helping writers get their ideas across with clarity, while not losing their own style. I’m an opinionated editor, so I also try hard to ensure that writers understand when my suggestions are just my opinion, and when I think there are mistakes in style or content that really do need to be fixed. I am also fussy about the details, about being consistent with capitalizations, about keeping ideas logically ordered, and about making sure that readers without the same background might still understand the gist of the post¹.

As such, my post reviews can get a bit lengthy and it’s not unreasonable for me to have 20-30 comments on a standard post. That’s not too problematic, but if I had to complain it might be about the technical edits, like fixing the capitalization of ‘rOpenSci’², ensuring headings are in sentence case, and that links to ropensci.org pages are relative. These aren’t complicated fixes, but if you have to remember to keep an eye out for them, and then create a GitHub PR review suggestion for each fix, it can become a tad tedious.

Maëlle’s timely sniping helped me finalize my collection of tools to help streamline editorial tasks.

Spell checking with cSpell
Linting³ with Vale
Formatting with Panache
Creating GitHub PR suggestions with GitHub Pull Requests

All of these tools can be installed and used in different ways. They are also powerful with many different possible customizations and configurations. Here, I’ll share with you how I use these tools as extensions in Positron to help make it easier to write and edit posts for the rOpenSci blog. Hopefully this inspires you to explore how you might set them up to support your workflows! Further, if you’re interested in setting up your own tools, perhaps you want to check out this blog post on “All the Ways to Programmatically Edit or Parse R Markdown / Quarto Documents”.

General setup

For each tool, you’ll want to install the Positron extension, and then set up your configuration. Configurations can usually be specified at three different levels:

User: Your system-wide setup which is how you want things to work in general across projects. User config files are generally stored somewhere in your home directory.
Project: Project-wide setup which overrides your user setup if the project does things differently. These config files are stored in the project directly (like roweb3, for the rOpenSci blog).
File: File or file section setup which works at a very local scale. Usually this configuration is indicated by in-file comments.

More specifically here are (some) of the locations/names for configuration files and links to their documentation sections for more details.

Level	cSpell	Vale	Panache
User	Positron settings	`.vale.ini` in any parent dir. or global config	`~/.config/panache/config.toml`
Project	`.cspell.json`	`.vale.ini`	`.panache.toml`
File	Inline Comments	Inline Comments	Inline Comments

This means you can have different rules for different projects, and override them as needed. In the following examples, I’ll show you how I do this for posts on the rOpenSci blog.

Code Spell Checker (cSpell)

First is my spell checker, which probably doesn’t need much explanation. However, it’s nice to use a spell checker which also works on code. I use the Code Spell Checker (cSpell) extension by Street Side Software and installed the languages extensions individually:

Canadian English⁴ – Code Spell Checker
French – Code Spell Checker
Portuguese – Code Spell Checker
Spanish – Code Spell Checker

Alternatively, you could also install the cSpell Bundled Dictionaries instead.

To configure this extension, I added two types of files: a project-level configuration file, and two dictionaries of words to consider ‘correct’.

The project level configuration file, .cspell.json, lists languages to use for different files (to ensure index.es.md files go through the Spanish spellchecker, while index.pt.md files go through the Portuguese spellchecker, etc.). It also includes a list of globs for file paths we can ignore (I’m really not interested in spelling mistakes in the .git folder), as well as pointing to dictionaries.

These dictionaries are initially created by functions from my promoutils package, an R package for all my rOpenSci community workflows. wordlist_create() creates a wordlist based on rOpenSci packages and author names, so they don’t trigger the spell check if they aren’t recognized. wordlist_update() updates this list with new names as needed.

We keep these dictionaries in a .wordlists folder. Names are stored in the .wordlists/names.txt file, and we also have a .wordlists/words.txt file which stores words which are considered correct in the rOpenSci context (like ‘usecases’).

I should also note that I have a personal list of user words stored in my Positron user settings which lists words (like my name!) which I want to be considered correct across all projects.

When writing posts, we can also override the language settings within a post using a special comment. For example if we want to use English and Portuguese for a post we could add <--- cSpell: language en,pt--> to the document.

We can also include post-specific words to ignore, which is handy for acronyms. For example, if we wanted to ignore the acronym CSCW we could use at the top of a post.

Spell check issues pop up as a warning in my text window, or as a list under “Spell Checker Issues By File” my lower window pane so I can review them, add them to word lists, or just mentally ignore them.

Vale

For linting text (checking the style and meaning of the words) I use the Vale VSCode extension by chrischinchilla⁵. Vale helps me check that the Blog Style rules are respected, and gives suggestions for alternative word choices to avoid common mistakes (such as words or expressions which might be derogatory).

To setup Vale I created a project-specific Vale configuration file .vale.ini⁶ in the roweb3 repository. I keep my personal .vale.ini file in a higher level folder that holds all my R projects. In addition to the Vale configuration file, I also created a Vale styles folder in roweb3/.vale-styles. This is where Vale rules are installed if we use predefined rules, and where I can put rOpenSci-specific rules for the blog. The first time you use Vale you’ll want to run vale sync in the terminal to install the standard, non-custom, rules. I .gitignore all rules which are installed, but track and push custom rules.

Vale is where I’ve made the most customizations, especially with the rOpenSci Blog.

I’ve added a specific Blog vocab list to ensure proper capitalization of rOpenSci projects and (not to mention “rOpenSci” )
I’ve turned off a lot of specific rules which are a bit too aggressive for a blog which allows people to write casually and informally as they like (including using words like “very” ).
I’ve created custom rules to modify existing rules ⁷
I’ve created custom rules to enforce our style guide, like using Title Case for blog post titles⁸, sentence case for subheadings, and using relative links for ropensci.org pages.

This is just the start! I imagine the more I use these rules the more fine tuning I’ll do.

Vale problems are classified as messages, warnings, or errors, and are highlighted in the text window as a quick fix and listed in the Problems pane in my lower window.

I should also note that for all the rules I’ve disabled, there are a lot of opinionated rules left. We keep them as prompts to think about our writing, not because we must follow them!

Vale’s write-good rule doesn’t want me to start a sentence with ‘There is’, but I’m going to anyway!

Panache

For formatting text, I use the Panache extension by jolars to format the (R)markdown files for the blog. This is probably the smallest amount of setup, as all we need is a minimal .panache.toml configuration file in the roweb3 repository. However, this file instructs Panache to do one super awesome thing for us, especially for translations of multilingual blog posts:

[format]
wrap = "sentence"

If you set up Positron to format on save, Panache automatically wraps text by sentence every time you save the file. This means that when a blog post is sent for a first pass translation using babeldown, the translation comes back pretty good. Alternatively, if the line breaks are in the middle of a sentence, the translation can become garbled as lines are treated as disjointed sections of text.

For my other work, I use wrap = "reflow", set in my user configuration file in ~/.config/panache/config.toml.

GitHub Pull Requests

Finally, once I’ve got all the fiddly edits on a post’s (R)md file ready to go, I use the GitHub Pull Requests extension to convert these edits to GitHub PR review suggestions. This is really handy if you find yourself making many small suggested changes to GitHub PRs.

To review blog posts, I fetch the PR with usethis::pr_fetch(), and then open the blog post (R)md file in Positron side by side with the html preview of the post in my web browser.

Then I review the html preview and make the edits directly in the (R)md file. When I’m done, I right click on the edited file name in the Source Control > Changes and select Create Pull Request Suggestions.

This converts my edits to GitHub PR review suggestions which I can then review in Positron, or as I prefer, in a web browser (and fix weird ones, such as suggestions which delete part of a section in one edit but add it back in the next; it’s not always a perfect process). Once all the suggestions are converted, the extension asks me if I want to revert my changes (which I usually do).

A note of caution, I find this tool a bit confusing to use on a PR that has a lot of comments already. The comments it makes are sometimes hidden or split in odd ways and it’s easy to accidentally create duplicates. In these situations it’s sometimes easier just to make the suggestions in a browser as you might normally.

Why so many tools?

Each of these tools provides me a specific solution to a problem. There is some overlap among them; Vale could do spell checks, and Panache could do linting. However, I find that by using the tools separately I can achieve an especially detailed and customized setup that works really well with the rOpenSci blog in particular, and with my work in general.

By including the configuration files in the roweb3 repository, people who also use these tools will automatically use the configurations we’ve setup for the rOpenSci blog when they write a post. We also plan to add instructions for how to use these tools to the Blog Guide. This should give blog writers the option of using these tools if they would like to.

However, even if other writers don’t use these tools, it’s still very useful for me to see a list of potential problems to double check at the end of my review without having to remember to check for them manually. It means I can focus more on the review of the content rather than worry about whether it’s Ropensci or rOpenSci

Don’t judge this post by these ideals, I said I’m a opinionated editor, writing is completely different . ︎
It’s rOpenSci. Not Ropensci, not RopenSci and not ropenSci. ︎
“Linting” with respect to text or prose means checking the style and meaning of the words. ︎
I’m Canadian so generally follow Canadian spelling (a mix of British and American for those of you new to the complex world of English spelling differences). At rOpenSci, we generally just ask an author to pick one and stick to it. ︎
There is also Vale by errata-ai, but this extension has been deprecated in favour of Vale VSCode. ︎
If you get an error on startup, you may need to tell Vale where this is explicitly by modifying Projects’ settings.json file to include "vale.valeCLI.config": ".vale.ini" ︎
For example, alex worries that the word “Mexican” might be used in a racist manner, but at rOpenSci, it’s stated with pride and I don’t want Vale to flag our community members for mentioning their nationality ︎
But awesomely, we can enforce this rule for English, but not Spanish posts! ︎

To leave a comment for the author, please follow the link and comment on their blog: rOpenSci - open tools for open science.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue reading: FOSS Tools for Lazy Editors

A New Guide: Organizing Events for First-time Contributors

rOpenSci — Thu, 02 Jul 2026 00:00:00 +0000

[This article was first published on rOpenSci - open tools for open science, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Making your first contribution to open source can be both empowering and yet very intimidating.

– rOpenSci FTC Guide

Last year we were grateful to receive funding from NumFOCUS¹ to organize a series of events designed to reduce barriers restricting First-Time Contributors to Free and Open Source Software (FOSS). There are many barriers² to first time contributions, but making these contributions can be an empowering experience. To help reduce some of these barriers we hosted two types of events: mini-translathons and mini-hackathons.

A mini-translathon is a short, live, coworking session, focused on translation and localization contributions. Participants review and improve translations of documentation, websites, or other resources. They also work with guidance from mentors and editors, often collaborating in language-specific groups. The goal is to make content accessible in multiple languages while helping newcomers learn translation workflows and tools.

We paired our mini-translathon with a Portuguese Community Call (A comunidade R fala português³) which preceded the mini-translathon at LatinR 2024.

Demonstrating the PR process during the translathon

Similarly, a mini-hackathon is a short (typically ~2-hour), live, online coworking session designed to help people make their first contributions to open-source software. Participants work on small, well-prepared tasks such as fixing bugs, improving code, or updating documentation. Maintainers and mentors are available in real time to guide them, answer questions, and support the contribution process. The focus is on learning by doing in a collaborative and supportive environment.

Again, we paired our mini-hackathons with an English Community Call (From Novice to Contributor: Making and Supporting First-Time Contributions to FOSS) which was then followed by the two mini-hackathons to support contributors to coding projects.

Yani and Juan Cruz participate in the mini-hackathon together

Part of the grant we received also allowed us to write up our processes and findings as a guide book “From User to Contributor: Organizing Events for First-Time Contributors”, which we are excited to share with you!

In our guide we start by discussing why supporting first time contributors is important. We describe our pilot events, how they went, feedback we received, and ideas for future improvement. The main chapters of the guide then cover how to run these events in greater detail. This includes Timelines, Community Calls, Mini-translathons, and Mini-hackathons, all from the perspective of supporting first time contributors. In the Appendices we include communication examples and templates.

We hope that this guide can be useful to other communities beyond rOpenSci. If you use this guide to create your own events to support first time contributors, we hope you’ll let us know!

Thanks to NumFOCUS for the Small Development Grant to support this work.

NumFocus is rOpenSci’s fiscal sponsor. ︎
Steinmacher et al. identified 13 social barriers. Igor Steinmacher, Tayana Conte, Marco Aurélio Gerosa, and David Redmiles. 2015. Social Barriers Faced by Newcomers Placing Their First Contribution in Open Source Software Projects. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW ‘15). Association for Computing Machinery, New York, NY, USA, 1379–1392. https://doi.org/10.1145/2675133.2675215︎ ︎
“The R community speaks Portuguese” ︎

To leave a comment for the author, please follow the link and comment on their blog: rOpenSci - open tools for open science.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue reading: A New Guide: Organizing Events for First-time Contributors

May 2026 Top 40 New CRAN Packages

Joseph Rickert — Tue, 30 Jun 2026 00:00:00 +0000

[This article was first published on R Works, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Three hundred twenty-three of the new packages were submitted to CRAN in May. Here are my Top 40 picks in eighteen categories: Artificial Intelligence, Computational Methods, Ecology, Education, Finance, Functional Data Analysis, Genomics, Machine Learning, Medical Statistics, Meta Analysis, Probability, Process Control, Psychometrics, Statistics, Surveys, Time Series, Utilities, and Visualization.

Artifical Intelligence

corteza v0.6.9: Implements an agent runtime that gives Large Language Models (LLMs) from Anthropic, OpenAI, Moonshot, and Ollama direct access to a live R session with managed workspace state. Tools execute as function calls with provenance tracking, and a deterministic retrieval system keeps relevant objects in context across turns. There are three entry points: a shell command-line interface, a console read-eval-print-loop, and a Model Context Protocol server for external clients. There are four vignettes, including Package as Skill and Skills.

ElicitationWizard v0.1.0: Implements a Shiny application for eliciting Bayesian prior distributions using large language models (LLMs). Supports multiple LLM experts, linear opinion pooling, and the Delphi method for iterative consensus. For more details, see Falconer et al. (2022) and Selby et al. (2025). There is a Getting Started vignette and a Tutorial.

Computational Methods

boids4R v0.3.1: Provides deterministic two- and three-dimensional boids and swarm simulations implementing Reynolds-style separation, alignment, and cohesion rules with optional obstacles, attractors, predators, species parameters, and reproducible frame export. The simulation state is renderer-neutral, meaning that optional adapters can transfer data to visualization packages such as ggWebGL. The model follows Reynolds (1987). There are five vignettes, including Getting Started and Scenario Gallery.

combss v0.1.0: Reformulates the NP-hard discrete subset selection problem as a continuous optimisation over the hypercube , to be solved via a Frank-Wolfe homotopy algorithm with closed-form ridge inner solves. Supports linear (Gaussian), binary logistic, and multinomial regression. See Moka et al. (2024) and Mather et al. (2026) for methodological details and the vignette for an example.

diffcp v0.1.1: Provides a port of the python diffcp package. Functions compute the derivative of the optimal solution map of a convex cone program, treating the program as an implicit function of its data (constraint matrix, offset, objective coefficients, and optionally a quadratic), mirroring Agrawal et al. (2019). See the vignette.

LangevinFlow v0.1.0: Implements Langevin diffusion based Markov chain Monte Carlo samplers, including the Unadjusted Langevin algorithm and the Metropolis-Adjusted Langevin algorithm with minimal dependencies, with the intention of supporting Bayesian inference and stochastic optimization. The core sampling loops are written in C++ via Rcpp and RcppArmadillo Methods follow Roberts and Tweedie (1996) and Roberts and Rosenthal (1998). See the vignette.

Ecology

BRCore v2.0.7: Implements a unified framework for the identification and ecological interpretation of core microbiomes across time and space, enhancing robustness and reproducibility in microbiome data analysis. See Shade A, Stopnisek N (2019). See Sloan et al. (2006) for details on abundance-occupancy distributions and Burns et al. (2015) for neutral models. The vignette provides an introduction.

ELAplus v1.0.2: Provides tools for Energy landscape analysis and a systematic method for analyzing an energy landscape represented as a weighted network. Functions are especially designed to analyze ecological dynamics, i.e., transitions in community compositions. See Suzuki et al. (2021) for details about the analysis framework and the vignette for examples.

fb4package v2.1.0: Implements the Fish Bioenergetics 4.0 framework described in Deslauriers et al. (2017) and provides automated parameter optimization, multi-prey diet modeling, and comprehensive energy budget simulations for fisheries research and aquaculture applications. Includes species-specific parameter databases and tools for modeling fish growth, consumption, and metabolism under varying environmental conditions. There are six vignettes, including Introduction and Temperature Sensitivity and Climate Change Scenarios.

Education

bibnets v0.6.0: Provides functions to import, construct, and export bibliometric networks from scholarly metadata. Reads Scopus, Web of Science, BibTeX, RIS, OpenAlex, Lens.org, Dimensions, and Crossref exports. Goes beyond standard co-networks with attention-weighted networks, position-aware counting, similarity and dissimilarity normalisations, temporal networks, and local citation scoring. See López-Pernas, Saqr & Apiola (2023) for the methods involved. There are four vignettes, including Getting Started and Reading bibliometric data.

Finance

contagionchannels v0.1.3: Implements a two-stage framework for the joint detection and attribution of cross-border financial contagion. Stage one detects directional information flows between equity markets. Stage two attributes each significant directional link to one of five mutually exclusive transmission channels: trade, financial, Ggopolitical, behavioural, or monetary policy. The package also bundles datasets and scripts to reproduce the headline findings of Bhandari, Parida and Sahu (2026). There are three vignettes, including Using Custom Datasets and Methodology Guide.

Functional Data Analysis

SmoothPLS v0.1.5: Implements the partial least-squares algorithm for functional data through the concept of active area integration by building upon the basis expansion methods described in Aguilera et al. (2010). Functions handle both scalar and categorical functional data and interpretable regression curves, even for discrete state changes. There are six vignettes, including smoothPLS ScalarFD and SmoothPLS multi state.

Genomics

cclustr v0.1.1: Provides tools for performing consensus clustering on multiple imputed datasets that support a range of clustering algorithms across imputations, including hierarchical methods, partition-based approaches such as k-means, and methods for mixed or categorical data. Consensus solutions are derived via hierarchical clustering applied to a dissimilarity matrix. The consensus clustering framework is based on Monti et al. (2003), rank aggregation methods follow Pihur et al. (2007), and the proportion of ambiguous clustering metric is based on Senbabaoglu et al. (2014). See the vignette.

Machine Learning

tmfast 0.1.1: Provides functions to fit topic models using varimax-rotated principal component analysis, following the “vintage factor analysis” approach of Rohe & Zheng (2020). Leverages truncated PCA via irlba for sparse matrices, enabling fast model fitting on large corpora. Includes an information-theoretic approach to vocabulary selection, broom-compatible tidiers for extracting word-topic and topic-document matrices into a tidy data workflow, and samplers for constructing simulated corpora for benchmarking and method evaluation. There are two vignettes Fast topic modeling with real books and Fitting topic models with tmfast.

Medicine

BSET v1.0: Implements the Bayesian Surrogate Evaluation Test (BSET) for assessing the validity of surrogate markers in clinical trials and provides hypothesis testing tools to evaluate whether a surrogate can reliably estimate the causal effect of a treatment on a primary outcome. Addresses key limitations of the frequentist method, including the lack of causal interpretability and the inability to adjust for covariates in the estimation process. See Carlotti and Parast (2026), Parast et al. (2024) for background and the vignette for a tutorial.

DICErClust v0.1.2: Implements Deep Significance Clustering (DICE), a self-supervised learning framework designed to identify clinically meaningful and risk-stratified patient subgroups from electronic health record data. DICE jointly optimizes deep representation learning, clustering, and outcome prediction while enforcing statistical significance between predicted outcomes and cluster membership, and produces subgroups that are both clinically coherent and predictive. See Huang et al. (2021) and the vignettes Introduction and Heart Failure Risk Stratification.

Meta Analysis

MetaHunt v0.1.0: Provides tools for privacy-preserving meta-analysis of function-valued quantities across heterogeneous studies. Implements a pipeline that includes the denoised functional successive projection algorithm for basis hunting, constrained weight estimation, Dirichlet regression of weights on study-level covariates, target prediction, and split/cross conformal prediction intervals. The methodology described in Shi, Imai, and Zhang (2026). There are eight vignettes, including Get started and An Introduction.

Probability

BetaDanish v0.2.0: Implements the four-parameter Beta-Danish distribution and its three-parameter submodel for survival and reliability analysis, based on Ahmad and Danish (2025), and provides functions for density, distribution, quantile, hazard, and random generation. There are five vignettes, including Introduction and Bayesian Estimation.

mhn v0.1.0: Provides density, distribution, quantile, and random generation functions for the Modified Half-Normal (MHN) distribution, along with moments, mode, and the Fox-Wright Psi function used as the normalizing constant. The MHN distribution arises as a conditional posterior in Bayesian MCMC and generalizes the half-normal, truncated normal, and square-root gamma distributions. Implements efficient sampling via the Sun, Kong & Pal (2023) algorithms and the Gao & Wang (2025) RTDR method. See the vignettes Introduction and Theoretical Background.

Process Control

shewhartr v1.3.0: Provides a toolkit for statistical process control that combines the rigor of classical Shewhart methodology with modern tidyverse-native interfaces that include classical control charts, regression-based control charts for processes with trend, Nelson runs tests, average run length simulation, process capability indices, and Box-Cox transformation guidance. See Montgomery (2019), Nelson (1984), and Woodall (2000) for background. There are eleven vignettes, including Getting started and Regression-based control charts.

Psychometrics

personnelSelectionUtility v1.0.2: Implements classical and contemporary utility-analysis methods for personnel selection, organised by criterion scale and selection structure. Provides multiple methods, including Taylor-Russell classification Taylor and Russell (1939), and Brogden-Cronbach-Gleser monetary utility Brogden (1949). There are five vignettes, including Reproducing canonical examples from the literature and Utility-analysis taxonomy for personnel selection.

Statistics

bvarnet v1.0.1: Provides functions for the Bayesian estimation of multilevel vector autoregression models using Stan. Supports Gaussian, binary, and ordinal (adjacent category) outcome variables with random effects and customizable priors. There are six vignettes, including an introduction and Mixed Model.

glmbayes v0.9.5: Provides Bayesian linear and generalized linear model fitting with independent and identically distributed posterior samples. Features include functions that mirror lm() and glm() interfaces, prior family specifications for Gaussian, Poisson, Binomial, and Gamma models with log-concave likelihoods, accept-reject sampling for non-conjugate priors, and optional OpenCL acceleration for larger models. See Nygren and Nygren (2006) for accept-reject methods. There are twenty-seven vignettes, including Getting started and Foundations of GLMs.

griddy v0.1.0: Provides tools for exploratory geospatial distribution dynamics with sf objects. Includes pooled and time-specific classification of longitudinal spatial values, classic discrete Markov transition matrices, spatial Markov matrices conditioned on spatial-lag classes, endpoint and adjacent rank mobility summaries, and ggplot2 visualizations. Methods follow Rey (2001) and Rey et al. (2016). See the vignette to get started.

RMeDPower2 v1.0.2: Provides functions to analyze data from repeated measures experiments with hierarchical or crossed experimental designs. Supports testing modeling assumptions, identifying outlier observations and experimental units, estimating statistical power, and performing sample size calculations. For details, see Shin et al. (2022) and Bates et al. (2015). See the vignette for a tutorial.

ROCsurvcomp v0.1.2: Implements nonparametric and semiparametric methods for comparing two survival distributions under non-proportional hazards. The methods are based on the Receiver Operating Characteristic (ROC) curve length described in Bantis et al. (2021) and the overlap coefficient method of Franco-Pereira et al. (2021), as well as a joint ROC length-OVL-based approach. See the vignette.

sssvcqr v0.0.4: Implements sparse-smooth spatially varying coefficient quantile regression combining the quantile regression of Koenker and Bassett (1978) with grouped variable selection of Yuan and Lin (2006), graph regularization, and the alternating direction method of multipliers of Boyd et al. (2011). Functions provide graph-regularized estimation, spatially blocked cross-validation, prediction, diagnostics, and simulation helpers for global-local spatial quantile regression. See the vignettes Getting Started and Lucas County Housing Example.

TemporalHazard v1.1.0: Implements the multiphase parametric hazard model of Blackstone, Naftel, and Turner (1986) with a focus on behavioral parity, transparent numerics, and provides reproducible validation against reference outputs from the original C/SAS HAZARD program, originally developed at the University of Alabama at Birmingham. The generalized temporal decomposition family extends to longitudinal mixed-effects settings Rajeswaran et al. (2018). There are eight vignettes, including Mathematical Foundations and Prediction & Visualization.

Surveys

stepssurvey v0.1.0: Provides a complete analysis pipeline for the WHO STEPwise Approach to NCD Risk Factor Surveillance (STEPS) as described in Riley et al. (2016). Imports raw survey data (CSV, Excel, Stata, SPSS), applies WHO-standard cleaning and re-coding, sets up complex survey designs, computes all standard NCD indicators, and generates publication-ready tables, visualizations, and reports. There are four vignettes, including Preparing STEPS Data for Analysis and Interactive Analysis with the Shiny App.

surveycore v1.0.0: Implements a modern, S7-based foundation for survey analysis spanning both probability and non-probability samples. Probability sample designs include Taylor series linearization, replicate weights (BRR, Fay, jackknife, bootstrap), and two-phase estimation, following Lumley (2004). Non-probability sample designs support bootstrap and jackknife variance estimation for opt-in panels and convenience samples. There are three vignettes, including Getting Started and surveycore vs. survey and srvyr.

surveyframe v0.3.2: Supports survey research workflows built around a typed instrument object, the sframe. Features include visual instrument design via a browser-based builder or Shiny, exporting to a self-contained static HTML survey, an embeddable Shinymodule, SHA-256 integrity-checked serialisation, multi-page survey rendering, branching logic, response quality checking, scale scoring, psychometric diagnostics and more. There are seven vignettes including analyzing survey responses and Building a survey instrument.

Time Series

icomb v0.2.0: Implements the Information Combination (IComb) approach proposed by Nguyen, Vahid and Wickramasuriya (2025) for hierarchical forecast reconciliation. The method combines information from base forecasts constructed using different information sets while ensuring coherence. See the vignette.

fable.bayesRecon v0.1.0: Implements probabilistic reconciliation methods within the fable framework for hierarchical time series forecasting following fable conventions. See the vignette for examples and the following for methodological background: Corani et al. (2021), Zambon et al. (2024a), Zambon et al. (2024b), and Carrara et al. (2025).

mixtime v0.1.0: Provides flexible time classes for time series analysis and forecasting with mixed temporal granularities. Supports linear and cyclical time representations in discrete and continuous forms, with timezone support, across multiple calendar systems, including Gregorian and ISO week date calendars. Calendrical arithmetic enables conversion between time granules (e.g., days to months) and calendar systems. Multi-unit arithmetic allows for temporal analysis with other granules of common calendars. Time vectors of different granularities (e.g., monthly and quarterly) can be combined in a single vector. See the vignettes Extending mixtime and Time format strings.

ModalForecast v0.1.0: Implements parametric modal Autoregressive Integrated Moving Average (ARIMA) models utilizing the Skewed Distribution family. Supported distributions include the Skew-Normal, Skewed Student-t, and Skewed Laplace. Features include comprehensive residual diagnostics, robustness options (heavy tails, asymmetry), robust parametric bootstrap prediction intervals, and classical asymptotic inference via the Fisher Information matrix. Methods are described in Galarza et al. (2017). Look here for an example.

VARcheck v0.1.0: Provides model-agnostic visual diagnostics for vector autoregressive models. Given empirical data, model predictions, residuals, and optionally simulated data, the package assembles a multi-panel diagnostic grid: empirical vs. predicted time series, residual inspection, residuals vs. predictions scatter, and posterior predictive style checks via simulated trajectories. See Haslbeck et al. (2026) for the approach followed and the vignettes Example analyses and Getting Started.

Utilities

DT2 v0.1.2: Implements a modern R binding for DataTables V2 with modular extension loading, Bootstrap 5 styling, Shiny integration (proxy, events, inline inputs), server-side processing helpers, and standalone support. Configure DataTables options directly via R lists, a 1:1 mapping to the JavaScript API. There are five vignettes, including Getting Started and Shiny Integration.

gridmicrotex v0.0.4: Provides functions to render LaTeX math equations as native R grid graphics objects (grobs) using the MicroTeX C++ library as the layout engine. Produces resolution-independent vector output that works on any R graphics device, with no external LaTeX installation required. See the vignettes Introduction and Using LaTeX Math in ggplot2.

Visualization

ggsql v0.3.3: Provides functions to write queries that combine SQL (Structured Query Language) data retrieval with visualization specifications in a single, composable syntax, binds directly with the ggsql Rust library, and offers knitr and Shiny integration. See the vignettes Getting started and The ggsql knitr engine.

mSigPlot v2.0.38: Provides plotting functions for mutational signatures and mutational spectra, including single base substitutions, doublet base substitutions, and small insertions and deletions. Generates plots similar to those used previously in Alexandrov et al. (2020) and Rozen et al. (2026). See the vignette for an example.

To leave a comment for the author, please follow the link and comment on their blog: R Works.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue reading: May 2026 Top 40 New CRAN Packages

rOpenSci News Digest, June 2026

rOpenSci — Tue, 30 Jun 2026 00:00:00 +0000

[This article was first published on rOpenSci - open tools for open science, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Dear rOpenSci friends, it’s time for our monthly news roundup! You can read this post on our blog. Now let’s dive into the activity at and around rOpenSci!

rOpenSci HQ

Champions Program update

We have two concurrent cohorts, both in Spanish.

The 2025–2026 cohort is nearing the end of its participation in the program, so we are organizing the closing meeting and the overall evaluation.

The 2026–2027 cohort is continuing their training activities, meeting with their mentors, and starting to work on their packages, and they have been formally introduced on our blog! Read all about the 11 new Champions.

New editors Ronny Hernandez Mora, Joel Nitta, and Nick Tierney

We’re thrilled to introduce new editors Ronny Hernandez Mora, Joel Nitta, and Nick Tierney. An official welcome and thank you to all three!

A new guide: Organizing Events for First-time Contributors

Steffi LaZerte and Yanina Bellini Saibene released a fantastic new rOpenSci guide! Learn how to organize events for first-time contributors such as mini-hackathons and mini-translathons. Read more in the release announcement.

R-Universe updates

“Five recent R-Universe features you might have missed”: A clickbait title for a blog post you don’t want to miss! Jeroen Ooms describes five recent additions to the R-Universe platform:

Social media cards that actually look good
PACKAGES.rds support (or: implementing R internals in JavaScript)
Fancy sort/filter bars in the WebUI
For the impatient: trigger a sync manually
Making check results easier to find and share

In other news, R-universe user Tom Palmer also wrote about five things: “Five tips for managing your R-universe ”. You won’t believe the fifth one.

Yanina Joins the 2026 Sovereign Tech Fellowship

We’re excited to share that our Community Manager, Yanina Bellini Saibene, has been selected as a 2026 Sovereign Tech Fellow. During the fellowship, she will focus on making open source more accessible through improved contribution guidance, newcomer-focused mini-hackathons, multilingual training resources, and more sustainable localization practices across communities in the R ecosystem. These efforts will build on and extend rOpenSci’s work in community building, mentorship, and open science.

Quinceañera: celebrating 15 years together

In June, we held two community events and a co-working session to mark rOpenSci’s 15th anniversary. Across all three sessions, people shared memories of their first contribution, discussed ideas for the next 15 years, and reminded us of how genuinely welcoming rOpenSci and it’s community are. There’s more to come Keep an eye out for what we have planned for the rest of the year.

Software Peer-Review updates

Community member Athanasia Mo Mowinckel has started a new AI agent “skills” repo at ropensci-review-tools/ropensci-skills. The repo holds a variety of “skills”, which are human-readable markdown files, for AI agents to assist in preparing software for peer-review. Anybody thinking about using AI systems to prepare software for peer-review is encouraged to try out these experimental skills, and to help us improve them for others by opening issues or pull requests in the GitHub repo.

Our recent updates to the goodpractice package have also been enhanced with an all-new AI “skill”. This skill instructs agents to edit and improve your package’s code to comply with the full suite of goodpractice checks. You can try it out with the package’s new use_skill_gp() function.

Coworking

Read all about coworking!

Tuesday July 7nd 2026, 09:00 Americas Pacific (16:00 UTC) “Debugging in R”, with Yanina Bellini Saibene and cohost Shannon Pileggi.
- Read up on debugging in R.
- Meet community host, Shannon Pileggi, and discuss tips and tricks for debugging in R.
Tuesday August 4th, 09:00 Australia Western (01:00 UTC) “Vale and Text Linting”, with Steffi LaZerte and cohost Jonathan Carroll.
- Read up on text linting.
- Setup a linting framework for your projects.
- Meet co-host, Jonathan Carroll, and discuss Vale and text linting.
Tuesday September 1st, 14:00 Europe Central (12:00 UTC) “Getting to Know SORTEE”, with Steffi LaZerte and cohost Ed Ivimey-Cook.
- Visit SORTEE (Society for Open, Reliable, and Transparent Ecology and Evolutionary Biology).
- Meet co-host, Ed Ivimey-Cook, and learn more about SORTEE and how you might get involved.

And remember, you can always cowork independently on work related to R, work on packages that tend to be neglected, or work on what ever you need to get done!

Software

The following two packages recently became a part of our software suite:

pvEBayes, developed by Yihao Tan together with Marianthi Markatou, Saptarshi Chakraborty, and Raktim Mukhopadhyay: A suite of empirical Bayes methods to use in pharmacovigilance. Contains various model fitting and post-processing functions. For more details see Tan et al. (2025) https://doi.org/10.1002/sim.70195, https://doi.org/10.48550/arXiv.2512.01057; Koenker and Mizera (2014) https://doi.org/10.1080/01621459.2013.869224; Efron (2016) https://doi.org/10.1093/biomet/asv068. It has been reviewed by Kathryn Doering and Collin Cademartori.
nycOpenData, developed by Christian Martinez: Provides a unified set of helper functions to access datasets from the NYC Open Data platform https://opendata.cityofnewyork.us/. Functions return results as tidy tibbles and support optional filtering, sorting, and row limits via the Socrata API. The package includes endpoints for 311 service requests, DOB job applications, juvenile justice metrics, school safety, environmental data, event permitting, and additional citywide datasets. It has been reviewed by Haolin Dong and Michael Pascale.

Discover more packages, read more about Software Peer Review.

New versions

The following seventeen packages have had an update since the last newsletter: weathercan (v1.0.0), occCite (v0.6.2), lightr (v2.0.0), gutenbergr (v0.5.2), slopes (v2.0.0), qualtRics (v3.3.0), srr (v1.0.0), goodpractice (v1.1), pkgmatch (v0.5.4), pkgstats (v0.2.3), cffr (v1.4.1), dfms (v1.0.1), osmdata (v0.4.0), aRxiv (0.20), Athlytics (v1.0.6), ReLTER (3.1.1), and read.abares (v3.0.0).

The writexl package has a new maintainer, Bill Denney. NLMR is now maintained by Jakub Nowosad.

Software Peer Review

There are eighteen recently closed and active submissions and 4 submissions on hold. Issues are at different stages:

Four at ‘6/approved’:
- pvEBayes, Empirical Bayes Methods for Pharmacovigilance. Submitted by Yihao Tan. (Stats).
- nycOpenData, Convenient Access to NYC Open Data API Endpoints. Submitted by Christian Martinez.
- ernest, A Toolkit for Nested Sampling. Submitted by Kyle Dewsnap. (Stats).
- pkgmatch, Find R Packages Matching Either Descriptions or Other R Packages. Submitted by mark padgham.
Two at ‘5/awaiting-reviewer(s)-response’:
- lakefetch, Calculate Fetch and Wave Exposure for Lake Sampling Points. Submitted by jeremylfarrell.
- priorsense, Prior Diagnostics and Sensitivity Analysis. Submitted by Noa Kallioinen. (Stats).
Five at ‘4/review(s)-in-awaiting-changes’:
- RAQSAPI, A Simple Interface to the US EPA Air Quality System Data Mart API. Submitted by mccroweyclinton-EPA.
- RAMEN, RAMEN: Regional Association of Methylome variability with the Exposome and geNome. Submitted by Erick Navarro-Delgado.
- logolink, An Interface for Running NetLogo Simulations. Submitted by Daniel Vartanian.
- rcrisp, Automate the Delineation of Urban River Spaces. Submitted by Claudiu Forgaci. (Stats).
- galamm, Generalized Additive Latent and Mixed Models. Submitted by Øystein Sørensen. (Stats).
Two at ‘3/reviewer(s)-assigned’:
- ciecl, International Classification of Diseases ICD-10/ICD-11 for Chile. Submitted by Rodolfo Tasso.
- EpiStrainDynamics, Infer temporal trends of multiple pathogens. Submitted by Saras Windecker. (Stats).
Two at ‘2/seeking-reviewer(s)’:
- fcmconfr, Fuzzy Cognitive Map Analysis in R. Submitted by benroston. (Stats).
- coevolve, Fit Bayesian Generalized Dynamic Phylogenetic Models using Stan. Submitted by Scott Claessens. (Stats).
Three at ‘1/editor-checks’:
- grumpy, Read NumPy .npy and .npz Files. Submitted by Hugo Gruson.
- metasurvey, Reproducible Survey Data Processing with Step Pipelines. Submitted by Mauro Loprete.
- LBDiscoverAnalysis, Co-occurrence Discovery Models and Visualization for Biomedical LBD. Submitted by Chao Liu.

Find out more about Software Peer Review and how to get involved.

On the blog

Software Review

Ronny Hernandez Mora, Joel Nitta, and Nick Tierney Join rOpenSci Software Peer Review Editorial Team by Ronny Hernandez Mora, Joel Nitta, Nicholas Tierney, and Yanina Bellini Saibene. Introducing three new editors for rOpenSci software peer review.
Celebrating Our Maintainers during Maintainers Month by Yanina Bellini Saibene. A Look Back at our Maintainer Month 2026 social media campaign.
Our goodpractice Package Has New Superpowers by Mark Padgham and Athanasia Mo Mowinckel. We have worked hard over the past few months on major upgrades to our goodpractice package. Checks are now grouped into categories, making it easier to control which checks are run. The biggest change has been adding over 100 new checks, from new lints to many new CRAN checks.
A New Guide: Organizing Events for First-time Contributors by Steffi LaZerte and Yanina Bellini Saibene. We introduce our Guide book for organizing events to support first-time contributors to FOSS.
Five recent R-universe features you might have missed by Jeroen Ooms. In this technote we look at a few recent additions that make R-universe a little nicer, faster, or more convenient to use.
Eleven Latin American Voices for Open Science: The New Cohort of Champions rOpenSci 2026 by Bastián Olea Herrera, Denisse Fierro Arcos, Durga Valentina Linares Herrera, Evelia Lorena Coss Navarrete, Gladys Choque Ulloa, José Daniel Conejeros, Linda Cabrera Orellana, María Florencia Tames, Marina Cecilia Cock, Patricia A. Loto, Estefania Torrejón, and Yanina Bellini Saibene. Introducing 11 new rOpenSci Champions. Other languages: Once voces latinoamericanas para la ciencia abierta: la nueva cohorte de Campeon(a|e)s rOpenSci 2026 (es).

Calls for contributions

Calls for maintainers

If you’re interested in maintaining any of the R packages below, you might enjoy reading our blog post What Does It Mean to Maintain a Package?.

charlatan, create fake data in R. Issue for volunteering.
hddtools, Tools to discover hydrological data, accessing catalogues and databases from various data providers. Issue for volunteering.

Calls for contributions

Refer to our help wanted page – before opening a PR, we recommend asking in the issue whether help is still needed.

Package development corner

Some useful information for R package developers.

goodpractice’s new features and behind-the-scene notes

Software Review Lead Mark Padgham and long-time community member Athanasia Mo Mowinckel have written a blog post particularly relevant to package developers for two reasons:

Learn how goodpractice, which helps make your package better, has improved.
Read how Mark and Mo collaborated, including their use of LLMs in the development process.

Dumb Ways for an Open Source Project to Die

If you’re interested in open-source software projects’ survivability, you’ll enjoy this write-up by Andrew Nesbitt shared by Yanina Bellini Saibene.

Refactoring with Jarl: unused functions and more

Hannah Frick and Maëlle Salmon wrote “Refactoring with Jarl: a coffee chat” on the R-hub blog.

A strategy for recovering data on request interruption

Gábor Csárdi summarized recent changes to the gh package. Especially interesting is his strategy for interruptions: the user starts a long query then interrupts the process… how to not lose the data that’s already been received? The solution is to make it accessible through rlang::last_error(). More details in the post.

curl summer of bliss

The curl project announced that it will not accept any vulnerability report during the month of July this year. This is both the opportunity for maintainers to take a break, and to advertise paid curl support, in which there will be no interruption of service.

To conventionally commit or not

Sumner Evans wrote an interesting post criticizing the conventional commits convention (starting commits with e.g. fix: for bug fixes, feat: for new features, etc).

More than .gitignore

Nelson Figueroa wrote a useful overview of the different ways to make Git ignore some files.

How to work with LLMs without losing your skills

Vicki Boykis wrote an insightful post “We should be more tired than the model” including pratical tips such as “Starting to use the agent only after I’ve spent 20 minutes on the problem” or “Discussing an agent’s proposed implementation with another person instead”.

Last words

Thanks for reading! If you want to get involved with rOpenSci, check out our Contributing Guide. This guide will help direct you to the right place, whether you want to make code contributions, non-code contributions, or contribute in other ways such as through sharing use cases. You can also support our work through donations.

If you haven’t subscribed to our newsletter yet, you can do so though our signup form. Until it’s time for our next newsletter, you can keep in touch with us through our website, Mastodon, or LinkedIn. See you soon!

To leave a comment for the author, please follow the link and comment on their blog: rOpenSci - open tools for open science.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue reading: rOpenSci News Digest, June 2026

cp1919 is on CRAN

https://pacha.dev/blog — Mon, 29 Jun 2026 23:00:00 +0000

[This article was first published on https://pacha.dev/blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This dataset, found in one of my old external drives, corresponds to the famous plot from Radio Observations of the Pulse Profiles and Dispersion Measures of Twelve Pulsars (Craft, 1970).

This is broadly known as the Joy Division’s cover from Unknown Pleasures. If you happen to know whom created the original CSV I used, please let me know so I can give proper credit.

The dataset contains “successive pulses from the first pulsar discovered, CP 1919, are here superimposed vertically. The pulses occur every 1.337 seconds. They are caused by rapidly spinning neutron star.” (The Cambridge Encyclopaedia of Astronomy, 1977)

Thanks to Scientific American, there is a complete explanation of the dataset and its origin.

The contribution I made consists in:

Easing the access to this very popular dataset.
Documenting everything and organizing the columns in a clear way (I hope).

A few days ago I wrote about the Tidyverse/Shiny internals, so here I will show how to plot this dataset exactly like the very popular Joy Division t-shirts both with ggplot2 and tinyplot. This is a good way to think more actively rather than resorting on muscular memory at my age and years using R.

Install

From CRAN

install.packages("cp1919")

From GitHub

pak::pkg_install("pachadotdev/cp1919")

Read

library(cp1919)
head(pulsar)
  measurement time radio_intensity
1           1    1           -0.81
2           1    2           -0.91
3           1    3           -1.09
4           1    4           -1.00
5           1    5           -0.59
6           1    6           -0.82

Visualize

Simple plot

This looks nothing like the Joy Division album cover but it is the starting point.

library(ggplot2)

ggplot(pulsar) +
    geom_line(
        aes(x = time, y = radio_intensity)
    ) +
    facet_wrap(~measurement)

library(tinyplot)

plt(
    radio_intensity ~ time,
    data = pulsar,
    facet = ~measurement
)

The Cambridge Encyclopaedia of Astronomy (1977)

Now we get a plot with the stacked waves.

With ggplot2 the easy option is to rely on ggridges that does a great job stacking the series.

library(ggridges)

col1 <- "white"
col2 <- "black"

ggplot(pulsar, aes(x = time, y = measurement, height = radio_intensity, group = measurement)) +
  geom_ridgeline(
    min_height = min(pulsar$radio_intensity),
    scale = 0.2,
    linewidth = 0.5,
    fill = col1,
    colour = col2
  ) +
  scale_y_reverse() +
  theme_void() +
  theme(
    panel.background = element_rect(fill = col1),
    plot.background = element_rect(fill = col1, color = col1),
  )

With tinyplot I have to go back one decade to the past and remember base R to adapt from ggridges internals.

pulsar2 <- transform(pulsar, measurement = factor(measurement))
measurements <- sort(unique(pulsar2$measurement))
n <- length(measurements)

# integer baselines: identical to ggridges ymax = y + scale * height
scale_fac <- 0.2  # mirrors geom_ridgeline(scale = 0.2)

pulsar2$y_stacked <- scale_fac * pulsar2$radio_intensity +
    (n - match(pulsar2$measurement, measurements))

par(bg = col1, mar = c(0, 0, 0, 0))

plt(
    y_stacked ~ time | measurement,
    data = pulsar2,
    type = type_area(alpha = 1),
    ylim = range(pulsar2$y_stacked),
    bg = col1,
    col = col2,
    axes = FALSE,
    legend = FALSE,
    frame.plot = FALSE
)

The Nature of Pulsars (Scientific American, 1970)

Similar to the previous plots.

col1 <- "#94cee1"
col2 <- "white"

ggplot(pulsar, aes(x = time, y = measurement, height = radio_intensity, group = measurement)) +
  geom_ridgeline(
    min_height = min(pulsar$radio_intensity),
    scale = 0.2,
    linewidth = 0.5,
    fill = col1,
    colour = col2
  ) +
  scale_y_reverse() +
  theme_void() +
  theme(
    panel.background = element_rect(fill = col1),
    plot.background = element_rect(fill = col1, color = col1),
  )

par(bg = col1, mar = c(0, 0, 0, 0))

plt(
    y_stacked ~ time | measurement,
    data = pulsar2,
    type = type_area(alpha = 1),
    ylim = range(pulsar2$y_stacked),
    bg = col1,
    col = col2,
    axes = FALSE,
    legend = FALSE,
    frame.plot = FALSE
)

Joy Division’s Unknown Pleasures (1979)

Now we get a plot you can print on a T-Shirt.

col1 <- "black"
col2 <- "white"

ggplot(pulsar, aes(x = time, y = measurement, height = radio_intensity, group = measurement)) +
  geom_ridgeline(
    min_height = min(pulsar$radio_intensity),
    scale = 0.2,
    linewidth = 0.5,
    fill = col1,
    colour = col2
  ) +
  scale_y_reverse() +
  theme_void() +
  theme(
    panel.background = element_rect(fill = col1),
    plot.background = element_rect(fill = col1, color = col1),
  )

par(bg = col1, mar = c(0, 0, 0, 0))

plt(
    y_stacked ~ time | measurement,
    data = pulsar2,
    type = type_area(alpha = 1),
    ylim = range(pulsar2$y_stacked),
    bg = col1,
    col = col2,
    axes = FALSE,
    legend = FALSE,
    frame.plot = FALSE
)

To leave a comment for the author, please follow the link and comment on their blog: https://pacha.dev/blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue reading: cp1919 is on CRAN

Let’s create a minimal R GUI (R GUI 2, previously Q)

https://pacha.dev/blog — Mon, 29 Jun 2026 23:00:00 +0000

[This article was first published on https://pacha.dev/blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A few months ago I started an experiment, the Q IDE, a Qt-based IDE for R. I started it because I felt that the old RStudio was nice but I wanted it to be even simpler, closer to a scientific calculator like the old TI-89 Titanium.

While working on Q, I noticed that while the name is fine for geeks like me (e.g., think 007 or QR matrix decomposition), it can lead to more confusion for new R users.

Today we have Positron, VS Code, VSCodium, and a wide range of IDEs that in my opinion are a bit “oversized” for daily data analysis with R.

R GUI 2, named after the R GUI for Windows, is a multiplatform project that aims to make using R as simple as possible, integrating everything into a single Windows rather than multiple small windows and making it intentionally similar to Stata’s UI.

One of the reasons why I want to make this similar to Stata visuals is that I am doing a PhD in Economics, where Stata is predominant and it offers an amazingly simple UI despite that R has the same functions and more. People who are smarter than me tend to feel overwhelmed with R and I think the R GUI 2 can minimize that problem by giving a streamlined R experience.

If you would like to contribute to complete a good R GUI 2 v1.0, please comment here. My idea is to run a fully collaborative project and that will require multiple inputs from the R community, including testing, comments, and checking how it works on different computers. At the moment I can only say “it works on my computer”.

The R GUI currently looks like this.

To leave a comment for the author, please follow the link and comment on their blog: https://pacha.dev/blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue reading: Let’s create a minimal R GUI (R GUI 2, previously Q)

Unlearning the Tidyverse

https://pacha.dev/blog — Sun, 28 Jun 2026 23:00:00 +0000

[This article was first published on https://pacha.dev/blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Unlearning the Tidyverse R My motivation to unlearn the Tidyverse after eight years using it. Author Mauricio “Pachá” Vargas S. Published June 29, 2026 Why I have been working in a project with the University of Texas that showed some ABI issues (e.g., incompatibilities between software) that ultimately led me to propose a few changes to the ‘httpuv’ package, a package central to multiple parts of the Tidyverse and also Shiny. I think Hadley Wickham is an amazing person and I met him at Latin R and the RStudio Conf over the lapse of a few years before COVID. This is nothing against people and I know that Hadley does not maintain ‘httpuv’. Initially, I started working on adapting ‘httpuv’ to minimize its dependencies and do it in a very old fashioned way using the R’s C API. I am quite confident using it even when the notation feels like a time travel to the DOS days. I have used computers since I was 5 years old and I got my first computer to start Kindergarden back in 1995 and I learned commands such as “CD GAMES/” and “KEEN.EXE” to play Commander Keen back then. What made me feel very dissapointed was what I consider being gaslighted in a public forum. For PR 437 I read things such as “this is AI” and “you do not understand software development”. We all come from different places but starting from there is probably not the best start. Furthermore, after answering a few questions about a PR I worked on for two weeks because I want to keep UT staff happy, I decided to open a new PR to do the same using Posit’s cpp11 (PR 438 ), which did not receive any comments despite my follow up on GitHub and via email. I started using R back in 2015, initially I struggled a lot and I started with data.table. A few years after starting with R, I was told about the Tidyverse which, besides the struggles to install it on Linux, I find it is an amazing end-user tool. Meeting comments like what I found with the ‘httpuv’ package does not feel like “this is the R way”. I have co-maintained the CRAN WebTechnologies task view for a few years now, I am soon attending an R sprint to address bugs in R base, and I maintained a wide range of packages including wbstats and tradestatistics . Once in a while I receive PRs to the multiple repositories I have, including the Shiny-dependent d3po package, which is one of the very few fully FOSS packages created as an alternative to the excellent highcharter package made by my friend Joshua Kunst (highcharter depends on highcharts which requires payment for some uses). As side contributions, I have dedicated a few years of my life to improve pointblank , an excellent package for data validation. In other words, R has been a significant percentage of my life and I have used it from 9 to 5 Mon-Fri for over a decade now. This is why I do not appreciate certain forms of communication, especially after having been involved with a global community where I’ve met fantastic people (and unfortunately people that have insulted me over the email, which is not the ‘httpuv’ case). At some point I considered myself to be some random guy from Chile living in the far end of the world. Now I consider myself to be the same infinitesimal person but living in London and writing a PhD thesis where two-thirds of it are about improving R for Econometrics and I am glad that during all these years I played a major role in making R Para Ciencia de Datos happen. Stay tuned. I am really rusty with apply() , mapply() and most of base R that is not %*% and t() . I have just removed the Tidyverse from my laptop and I will be posting how do to common data cleaning tasks with minimal typing in base R.

To leave a comment for the author, please follow the link and comment on their blog: https://pacha.dev/blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue reading: Unlearning the Tidyverse

R’ousseeuw²⁶ prize!

xi'an — Sun, 28 Jun 2026 22:26:56 +0000

[This article was first published on R – Xi'an's Og, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Great news that a major Statistics prize like the Rousseeuw Prize goes to the R Core Team, esp. Brian Ripley (University of Oxford), Martin Mächler (ETH Zürich), Kurt Hornik (WU Wien), Peter Dalgaard (Copenhagen Business School), and Luke Tierney (University of Iowa). R is indeed a unique phenomenon, where open-source and open-access has been developed by and for the statistics community. Which is about to release R version 4.6.1 (Happy Hop) version.

Thanks to the R Core Team (and congrats!). Half of the Prize goes to the other members of the Team.

“The international and independent jury, appointed by the King Baudouin Foundation, has recognised the groundbreaking work of five members of the R Core Team who have been awarded the Rousseeuw Prize for Statistics. The international award, which recognises major contributions to statistical research, honours their nearly three decades of unpaid work building R, the open-source language that has become the common foundation of modern statistical computing.

Statistics is everywhere. It determines whether a new medicine is safe enough to reach patients, monitors risk in financial markets, and tracks how diseases spread. R is the tool that made it accessible to everyone. The language that the R Core Team built is trusted by institutions including the US Food and Drug Administration, the European Central Bank, the Bank of England, and major pharmaceutical companies worldwide.”

To leave a comment for the author, please follow the link and comment on their blog: R – Xi'an's Og.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue reading: R’ousseeuw²⁶ prize!

Machine learning meets reality: Forecast evaluation for the 2026 FIFA World Cup

Achim Zeileis — Sun, 28 Jun 2026 22:00:00 +0000

[This article was first published on Achim Zeileis, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

After all 72 matches of the group stage in the 2026 FIFA World Cup our probabilistic forecasts are evaluated, revealing what the machine learning algorithm predicted well and where it struggled.

A challenging new tournament format

A couple of days ago the group stage of the 2026 FIFA World Cup was wrapped up after squeezing 72 matches into just a little bit more than two weeks. Thus, all pairings for the Round of 32 are fixed now. Today we want to assess the quality of our own probabilistic forecast for the 2026 FIFA World Cup based on an ensemble machine learning algorithm that we have published prior to the tournament.

Most of our predictions worked reasonably well and the corresponding results are within the limits of expected random variation. It turned out, though, that the switch from 32 to 48 teams in the tournament was not only challenging for the audience but also for the machine learning algorithm. There were many more matches between very unequal teams compared to earlier editions of the World Cup (i.e., the training data for the algorithm). Also, due to 8 out of 12 third-ranked teams also proceeding to the knockout stage, it often was more important for the teams not to lose a match (rather than to actually win it), thus favoring many draws. Finally, due to the many possibilities of assigning the third-ranked teams to the knockout matches, some teams profited more than others from the realized tournament draw in the Round of 32.

TL;DR

All tournament favorites proceeded to the Round of 32 and mostly the weaker teams dropped out of the tournament. Arguably the biggest surprises were the African teams (especially South Africa, Cape Verde, and DR Congo) who all “survived” the group stage.

While the predicted win/loss probabilities mostly conformed with the observed results, the predicted goal differences tended to be too low. Especially for matches between rather unequal teams the observed goal differences were often more extreme than expected by the algorithm. The likely reason for this is that there were many more weak teams in this tournament compared to earlier years due to the extension to 48 teams.

There were also somewhat more draws than expected (and fewer wins/losses with a margin of only one goal). Again, this is likely due to the new tournament format with 48 teams. One win and one draw was most sufficient to be among the best third-ranked teams who also proceed to the knockout stage. Also, those groups playing their matches last could behave more strategically and could try to settle for a draw. A fact which was painfully obvious in the memorable match between Algeria and Austria.

Group stage results

First, we look at the results in terms of which teams successfully advanced from the group stage to the Round of 32. The barplots below show the predicted probability for all teams to proceed to the Round of 32, in the observed ranking order, with the color highlighting which teams advanced to the knockout stage.

Clearly, all group favorites made the cut and mostly teams with lower probabilities dropped out. The biggest suprises were some of the African teams, notably South Africa (in Group A), Cape Verde (in Group H), and DR Congo (in Group K), all of which successfully “survived” the group stage. Moreover, although some of the tournament favorites (such as Spain, England, Germany, or Portugal) did not fully convince in their respective group stage matches, these performances did not have many negative consequences, yet. All of them proceeded to the knockout stage, typically still taking the top spot in their respective groups.

Match results

Next, we take a closer look at the 72 individual group-stage matches to check how well our forecasts conformed with the actual outcome. The stacked bar plot below groups all match results into five intervals (columns) based on their predicted goal difference for the stronger vs. the weaker team.

The first column summarizes 15 matches where both teams were predicted to be almost equally strong. More precisely, the stronger team was predicted to be only slightly better, with 0 to 0.35 more predicted goals on average. One third of these matches was won by the slightly better team, one third was lost, and another third ended in a draw. In short, the distribution of the outcomes conforms very well with the prediction that both teams would be essentially equally strong.

In the second and third column the predicted advantages of the stronger team increased to 0.35-0.7 goals and 0.7-1.05 goals, respectively, and also the empirical proportion of matches won increased accordingly.

However, in the last two columns with the most pronounced predicted advantages (goal difference of 1.05-1.4 and 1.4-1.75, respectively) the winning proportion remained high but did not increase further. Also, the proportion of draws remained relatively high, even in matches with a clear favorite.

This suggests that our probabilistic forecasts captured the actual outcomes better in matches with small to moderate differences between the teams. But it seems that the algorithm struggled a little bit in matches with very large predicted differences.

To explore this in more detail, we compare the observed goal differences in these matches with the expected distributions based on the Poisson model employed. This is brought out graphically by so-called hanging rootograms, separately for the low predicted goal differences (0-0.7, first two columns above) and the high ones (1.05-1.75, last two columns above).

In both panels, the red line shows the square root of the expected frequencies while the “hanging” gray bars represent the square root of the observed frequencies.

For the low difference subset in the panel on the left, the observed and expected distributions conform reasonably well. It is noticeable, though, that draws (goal differences of 0) are slightly overrepresented in the observations compared to the predictions.

However, for the high difference subset it is clear that there is a bias in goal difference predictions: Low observed goal differences are underrepresented whereas high observed goal differences are overrepresented. The overrepresentation of draws is also more pronounced in this subset.

As explained above, it is likely that these deviations are due to the new tournament format with 48 teams. Many more matches between extremely different teams occurred in this tournament compared to earlier tournaments with only few very weak teams. The machine learning algorithm apparently has not fully captured this. Similarly, the incentives for winning each match were not as strong as in previous tournaments because 8 out of 12 third-ranked teams also proceeded to the knockout stage.

Updated knockout stage predictions

Finally, we want to look ahead and explore how the realized tournament draw based on the group stage results changes the predicted winning probabilities for the 2026 FIFA World Cup. We do so under the assumption that all results so far are within the range of random variation and that we do not need to adapt the predictions for all possible matches. In other words, the simulation is based on the expectation that especially the top favorites Spain and England can still reach their full potential in the upcoming matches.

As for our original prediction, we simulate the knockout stage 100,000 times and then compute by how many percentage points the winning probabilities change.

This shows that Argentina and England profited most from the realized tournament draw. They are both in the arm of the tournament with fewer strong teams and they can only face each other in the semi-final. Therefore, Argentina’s winning probability increased by 3.1 percentage points (from 8.2% to 11.3%). Similarly, England’s winning probability increased by 2.6 percentage points (from 12.4% to 15.0%). Recall that these numbers are derived under the assumption that all teams will play according to the expectations from before the start of the tournament. Thus, additionally, one might want to factor in that Argentina played even stronger than expected and England somewhat weaker.

The teams who suffer most from the realized tournament draw include top favorites Spain and France along with Portugal and Germany because these are very likely to meet already in the Round of 16 (Spain vs. Portugal and France vs. Germany, respectively). Thus, these are much more difficult obstacles on the way to the World Cup Final compared to those for Argentina and England in the other arm of the tournament.

In any case, the most exciting part of the 2026 FIFA World Cup is only starting now and we can all be curious what is going to happen. There are still 32 teams in the race for the title! (Well, 31 after Canada has defeated South Africa in the first knockout match yesterday.)

To leave a comment for the author, please follow the link and comment on their blog: Achim Zeileis.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue reading: Machine learning meets reality: Forecast evaluation for the 2026 FIFA World Cup

R-bloggers

Repost: Automatically compile Quarto reports when new data lands

Eruption: announcing new R package VolcanoPlotR

Background

Features

Some examples

Why write an R package for volcano plots?

From Peer Review to Mentorship: My rOpenSci Story

Getting Involved with rOpenSci

Mentoring in the Champions Program

Looking Back

My last R posts: How conformalization helps weak models, fast conformal prediction with jackknife+ (and no refitting), and sklearn in R

Using scikit-learn models in R easily with the tisthemachinelearner R package

How conformalization helps weak models

Fast Conformal Prediction for Some Machine Learning Models (jackknife+ and no refitting)

Tabler Server A minimal framework to create web dashboards in R

Testing racial predictions with BISG

Instacart is running fixed effect regressions with PyFixest!

{golem} 1.0.0 is here

How to Put your Course Book Online

Summary

Follow along with github repo: “course-book-template”

Make a quarto book

Add an appropriate license

Using a README

Put it on online – github

Have it render when you make changes

SummaryTables: Publication-Ready Summary Tables for jamovi

Key Features

1. Summary Table

2. Continuous Table

3. Cross Table

4. Likert Table

5. Survival Table

6. Multivariable Regression

7. Univariable Regression

Additional Features

Save to Word

Journal Formatting & Translations

Help and Documentation

How to Put your Course Book Online

Summary

Follow along with github repo: “course-book-template”

Make a quarto book

Add an appropriate license

Using a README

Put it on online – github

Have it render when you make changes

Crude oil stocks at Cushing, Oklahoma by @ellis2013nz

An API for Everything There Is to Know About Packages

Know which packages are yours

Display packages developed at your organization

Server-side or client-side API requests

Search Packages

Conclusion

Close Enough? Using the WGI as a Proxy for the WJP Rule of Law Index

Worldwide Governance Indicators: Rule of Law

World Justice Project: Rule of Law Index

Why not both?

Conceptual and practical overlap

Project Setup and Data

Data Cleaning

Comparing Coverage

Cross-country agreement between the indices

Regional agreement between the indices

Agreement by country

First Difference Agreement

Buried Lede

Understanding Tail Analysis in Financial Markets

Why These R Packages?

Rethinking Validation for Spatial Machine Learning: Takeaways from the Talk

Footnotes

Citation

FOSS Tools for Lazy Editors

General setup

Code Spell Checker (cSpell)

Vale

Panache

GitHub Pull Requests

Why so many tools?

Using scikit-learn models in R easily with the `tisthemachinelearner` R package